Thread: Re: [COMMITTERS] pgsql: Improved parallel make support

Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 11/12/2010 03:16 PM, Peter Eisentraut wrote:
>> Improved parallel make support

> Looks like this patch has pretty comprehensively broken the MSVC build
> system. I'll see what I can recover from the wreckage.

There are also at least three non-Windows buildfarm members failing like
so:

gmake -C src all
gmake[1]: Entering directory `/home/pgbuild/pgbuildfarm/HEAD/pgsql.6736/src'
gmake[1]: *** virtual memory exhausted.  Stop.
gmake[1]: Leaving directory `/home/pgbuild/pgbuildfarm/HEAD/pgsql.6736/src'
gmake: *** [all-src-recursive] Error 2

I think we may have pushed too far in terms of what actually works
reliably across different make versions.

            regards, tom lane

Re: [COMMITTERS] pgsql: Improved parallel make support

From
Andrew Dunstan
Date:

On 11/12/2010 11:25 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> On 11/12/2010 03:16 PM, Peter Eisentraut wrote:
>>> Improved parallel make support
>> Looks like this patch has pretty comprehensively broken the MSVC build
>> system. I'll see what I can recover from the wreckage.
> There are also at least three non-Windows buildfarm members failing like
> so:
>
> gmake -C src all
> gmake[1]: Entering directory `/home/pgbuild/pgbuildfarm/HEAD/pgsql.6736/src'
> gmake[1]: *** virtual memory exhausted.  Stop.
> gmake[1]: Leaving directory `/home/pgbuild/pgbuildfarm/HEAD/pgsql.6736/src'
> gmake: *** [all-src-recursive] Error 2
>
> I think we may have pushed too far in terms of what actually works
> reliably across different make versions.

Yeah, possibly. And now it looks like this has broken the Solaris 
buildfarm members too.

I'm curious to know how much all this buys us. One reason I haven't 
enabled parallel make in the buildfarm is that it interleaves the 
output, which can be a pain. And build speed isn't really the 
buildfarm's foremost concern anyway. I know waiting for a build can be 
mildly annoying (ccache can be a big help if you're building 
repeatedly). But I don't feel we need to squeeze every last pip out of 
the build system.

cheers

andrew


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I'm curious to know how much all this buys us.

It *would* be nice if "make -k" worked better.  I frequently run into
the fact that (with the pre-existing setup) a compile error in the
backend prevented make from proceeding with builds of interfaces/,
bin/, etc, meaning that that work still remains to be done after I've
finished fixing the backend error.

But having said that, I won't shed many tears if we have to revert this.

It looks like all the unhappy critters are getting the same "virtual
memory exhausted" error.  I wonder whether they are all using make 3.80
...
        regards, tom lane


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
BTW, there's another problem here: "make -j2" on my Mac blows up with
this on stderr:

ld: file not found: ../../../../../../src/backend/postgres
collect2: ld returned 1 exit status
make[3]: *** [ascii_and_mic.so] Error 1
make[2]: *** [all-ascii_and_mic-recursive] Error 2
make[1]: *** [all-backend/utils/mb/conversion_procs-recursive] Error 2
make[1]: *** Waiting for unfinished jobs....
In file included from gram.y:12101:
scan.c: In function 'yy_try_NUL_trans':
scan.c:16242: warning: unused variable 'yyg'
make: *** [all-src-recursive] Error 2

Consulting stdout shows that indeed it's launched this series of jobs:

make -C backend/utils/mb/conversion_procs all
make -C ascii_and_mic all
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv-g  -I../../../../../../src/include   -c -o ascii_and_mic.o ascii_and_mic.c
 
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv-g  -bundle -multiply_defined suppress -o ascii_and_mic.so ascii_and_mic.o -L../../../../../../src/port -Wl,-d\
 
ead_strip_dylibs   -bundle_loader ../../../../../../src/backend/postgres

immediately after completing the src/timezone build, before the backend
build is even well begun let alone finished.  So the parallel build
dependency interlocks are basically not working.  This machine has gmake
3.81.
        regards, tom lane


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Andrew Dunstan
Date:

On 11/13/2010 11:12 AM, Tom Lane wrote:
> It looks like all the unhappy critters are getting the same "virtual
> memory exhausted" error.  I wonder whether they are all using make 3.80
> ...

Maybe we need to put back make version logging. Interestingly, narwhal, 
the mingw machine that has reported, didn't complain. It's running 3.81.

cheers

andrew


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Dave Page
Date:
On Sat, Nov 13, 2010 at 4:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> It looks like all the unhappy critters are getting the same "virtual
> memory exhausted" error.  I wonder whether they are all using make 3.80

Both my Sparc and Intel Solaris critters have 3.80.


--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Peter Eisentraut
Date:
On lör, 2010-11-13 at 11:06 -0500, Andrew Dunstan wrote:
> But I don't feel we need to squeeze every last pip out of 
> the build system.

Probably not on the buildfarm, but when you are developing, saving 20
seconds or 2 minutes per cycle can lead to hours saved.



Re: [COMMITTERS] pgsql: Improved parallel make support

From
Peter Eisentraut
Date:
On lör, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
> It looks like all the unhappy critters are getting the same "virtual
> memory exhausted" error.  I wonder whether they are all using make
> 3.80 ...

It turns out that there is an unrelated bug in 3.80 that some Linux
distributions have patched around.  3.81 or 3.82 are OK.



Re: [COMMITTERS] pgsql: Improved parallel make support

From
Peter Eisentraut
Date:
On lör, 2010-11-13 at 11:23 -0500, Tom Lane wrote:
> Consulting stdout shows that indeed it's launched this series of jobs:
> 
> make -C backend/utils/mb/conversion_procs all
> make -C ascii_and_mic all
> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
> -fwrapv -g  -I../../../../../../src/include   -c -o ascii_and_mic.o
> ascii_and_mic.c
> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
> -fwrapv -g  -bundle -multiply_defined suppress -o ascii_and_mic.so
> ascii_and_mic.o -L../../../../../../src/port -Wl,-d\
> ead_strip_dylibs
> -bundle_loader ../../../../../../src/backend/postgres
> 
> immediately after completing the src/timezone build, before the
> backend build is even well begun let alone finished.  So the parallel
> build dependency interlocks are basically not working.

On some platforms, you need to have backend/postgres built before any
dynamically loadable modules.  For those platforms, additional
dependencies will be necessary, I suppose.



Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On lör, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
>> It looks like all the unhappy critters are getting the same "virtual
>> memory exhausted" error.  I wonder whether they are all using make
>> 3.80 ...

> It turns out that there is an unrelated bug in 3.80 that some Linux
> distributions have patched around.  3.81 or 3.82 are OK.

So what do you mean by "unrelated bug"?  Can we work around it?
        regards, tom lane


Re: [COMMITTERS] pgsql: Improved parallel make support

From
"Erik Rijkers"
Date:
On Sat, November 13, 2010 18:15, Peter Eisentraut wrote:
> On lör, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
>> It looks like all the unhappy critters are getting the same "virtual
>> memory exhausted" error.  I wonder whether they are all using make
>> 3.80 ...
>
> It turns out that there is an unrelated bug in 3.80 that some Linux
> distributions have patched around.  3.81 or 3.82 are OK.
>

Just to mention another effect of the recent changes:

make 3.81, Centos 5.5

On a dual quadcore system where I used to built with -j 16, it now only succeeds with  -j 8.

(I seem to remember that 16 as opposed to 8 shaved a couple of seconds off, although I'm not quite
sure anymore)

make -j 16 gives:

cc1: error: thread.c: No such file or directory
make[4]: *** [thread.o] Error 1
make[3]: *** [submake-libpq] Error 2
make[2]: *** [all-pg_ctl-recursive] Error 2
make[1]: *** [all-bin-recursive] Error 2
make[1]: *** Waiting for unfinished jobs....
Use of assignment to $[ is deprecated at ./parse.pl line 21.
In file included from gram.y:12101:
scan.c: In function ‘yy_try_NUL_trans’:
scan.c:16242: warning: unused variable ‘yyg’
Use of assignment to $[ is deprecated at ./check_rules.pl line 18.
make: *** [all-src-recursive] Error 2


( A similar effect I see on a dual core fedora system (2.6.27.5-117.fc10.i686), where -j 16 always
ran, but now it needs -j 4 or less (it also has make 3.81)  )


Erik Rijkers





Re: [COMMITTERS] pgsql: Improved parallel make support

From
Peter Eisentraut
Date:
On lör, 2010-11-13 at 12:20 -0500, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On lör, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
> >> It looks like all the unhappy critters are getting the same "virtual
> >> memory exhausted" error.  I wonder whether they are all using make
> >> 3.80 ...
> 
> > It turns out that there is an unrelated bug in 3.80 that some Linux
> > distributions have patched around.  3.81 or 3.82 are OK.
> 
> So what do you mean by "unrelated bug"?  Can we work around it?

The information is fuzzy, but the problem has been reported around the
internet, and it appears to be related to the foreach function.  I think
I have an idea how to work around it, but I'll need some time.




Re: [COMMITTERS] pgsql: Improved parallel make support

From
Peter Eisentraut
Date:
On lör, 2010-11-13 at 20:07 +0200, Peter Eisentraut wrote:
> On lör, 2010-11-13 at 12:20 -0500, Tom Lane wrote:
> > Peter Eisentraut <peter_e@gmx.net> writes:
> > > On lör, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
> > >> It looks like all the unhappy critters are getting the same "virtual
> > >> memory exhausted" error.  I wonder whether they are all using make
> > >> 3.80 ...
> > 
> > > It turns out that there is an unrelated bug in 3.80 that some Linux
> > > distributions have patched around.  3.81 or 3.82 are OK.
> > 
> > So what do you mean by "unrelated bug"?  Can we work around it?
> 
> The information is fuzzy, but the problem has been reported around the
> internet, and it appears to be related to the foreach function.  I think
> I have an idea how to work around it, but I'll need some time.

Well, it looks like $(eval) is pretty broken in 3.80, so either we
require 3.81 or we abandon this line of thought.



Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Well, it looks like $(eval) is pretty broken in 3.80, so either we
> require 3.81 or we abandon this line of thought.

[ emerges from some grubbing about in the gmake sources... ]
It looks to me like the bug in 3.80 is only triggered when "eval"
expands to a long enough string to trigger reallocation of the variable
buffer.  (Ergo, the reason they didn't find it sooner was they only
tested on relatively short strings.)

I wonder whether the bug could be worked around if you did the iteration
on SUBDIRS in a foreach surrounding the eval call, so that each eval
dealt with only one subdir target.  This would result in a bit of
redundancy in the generated rules, but that seems tolerable.
        regards, tom lane


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Dave Page
Date:
On Sat, Nov 13, 2010 at 8:13 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On lör, 2010-11-13 at 20:07 +0200, Peter Eisentraut wrote:
>> On lör, 2010-11-13 at 12:20 -0500, Tom Lane wrote:
>> > Peter Eisentraut <peter_e@gmx.net> writes:
>> > > On lör, 2010-11-13 at 11:12 -0500, Tom Lane wrote:
>> > >> It looks like all the unhappy critters are getting the same "virtual
>> > >> memory exhausted" error.  I wonder whether they are all using make
>> > >> 3.80 ...
>> >
>> > > It turns out that there is an unrelated bug in 3.80 that some Linux
>> > > distributions have patched around.  3.81 or 3.82 are OK.
>> >
>> > So what do you mean by "unrelated bug"?  Can we work around it?
>>
>> The information is fuzzy, but the problem has been reported around the
>> internet, and it appears to be related to the foreach function.  I think
>> I have an idea how to work around it, but I'll need some time.
>
> Well, it looks like $(eval) is pretty broken in 3.80, so either we
> require 3.81 or we abandon this line of thought.

3.81 might be a problem for Solaris - unless I pay for a support
contract from Oracle, I'm not going to get any updates from them,
which means I'll have to install a custom build. Now that's no biggie
for me, but it does see to raise the bar somewhat for users that might
want to build from source.


--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
Dave Page <dpage@pgadmin.org> writes:
> On Sat, Nov 13, 2010 at 8:13 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> Well, it looks like $(eval) is pretty broken in 3.80, so either we
>> require 3.81 or we abandon this line of thought.

> 3.81 might be a problem for Solaris - unless I pay for a support
> contract from Oracle, I'm not going to get any updates from them,
> which means I'll have to install a custom build. Now that's no biggie
> for me, but it does see to raise the bar somewhat for users that might
> want to build from source.

For another data point, I find make 3.80 in OS X 10.4, while 10.5 and
10.6 have 3.81.  10.4 is certainly behind the curve, but Apple still
seem to be releasing security updates for it.

I was about to draw an analogy to flex -- we are now requiring a version
of flex that's roughly contemporaneous with make 3.81.  However, we
don't require flex to build from a tarball, so on second thought that
situation isn't very comparable.  Moving the goalposts for make would
definitely affect more people.

On the third hand, gmake is very very easy to install: if you're
capable of building Postgres from source, it's hard to believe that
gmake should scare you off.  (I've installed multiple versions on my
ancient HPUX dinosaur, and it's never been any harder than ./configure,
make, make check, make install.)

And on the fourth hand, what we're buying here is pretty marginal for
developers and of no interest whatever for users.

I still think it's worth looking into whether the bug can be dodged
by shortening the eval calls.  But if not, I think I'd vote for
reverting.  Maybe we could revisit this in a couple of years.
        regards, tom lane


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Andrew Dunstan
Date:

On 11/14/2010 10:44 AM, Tom Lane wrote:
>   And on the fourth hand, what we're buying here is pretty marginal for
> developers and of no interest whatever for users.
>
> I still think it's worth looking into whether the bug can be dodged
> by shortening the eval calls.  But if not, I think I'd vote for
> reverting.  Maybe we could revisit this in a couple of years.

+1

cheers

andrew


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Robert Haas
Date:
On Nov 14, 2010, at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I still think it's worth looking into whether the bug can be dodged
> by shortening the eval calls.  But if not, I think I'd vote for
> reverting.  Maybe we could revisit this in a couple of years.

+1.  The current master branch fails to build on my (rather new) Mac with make -j2.  I could upgrade my toolchain but
itseems like more trouble than it's worth, not to mention a possible obstacle to new users and developers. 

...Robert

Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Nov 14, 2010, at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I still think it's worth looking into whether the bug can be dodged
>> by shortening the eval calls.  But if not, I think I'd vote for
>> reverting.  Maybe we could revisit this in a couple of years.

> +1.  The current master branch fails to build on my (rather new) Mac
> with make -j2.

I complained of the same thing, but AFAICS that's not a make bug;
it's a missing build dependency, which could be fixed if we choose to
keep this infrastructure.  It probably ought to be fixed even if we
don't.
        regards, tom lane


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
I wrote:
> I still think it's worth looking into whether the bug can be dodged
> by shortening the eval calls.

In fact, that does seem to work; I'll commit a patch after testing a
bit more.

We still need someone to add the missing build dependencies so that
make -j is trustworthy again.
        regards, tom lane


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Robert Haas
Date:
On Sun, Nov 14, 2010 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I still think it's worth looking into whether the bug can be dodged
>> by shortening the eval calls.
>
> In fact, that does seem to work; I'll commit a patch after testing a
> bit more.
>
> We still need someone to add the missing build dependencies so that
> make -j is trustworthy again.

Yes, please.  This is currently failing for me:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -Werror  -bundle -multiply_defined suppress -o
ascii_and_mic.so ascii_and_mic.o -L../../../../../../src/port
-L/opt/local/lib -Wl,-dead_strip_dylibs  -Werror  -bundle_loader
../../../../../../src/backend/postgres^M
ld: file not found: ../../../../../../src/backend/postgres
collect2: ld returned 1 exit status
make[3]: *** [ascii_and_mic.so] Error 1
make[2]: *** [all-ascii_and_mic-recurse] Error 2
make[1]: *** [all-backend/utils/mb/conversion_procs-recurse] Error 2
make[1]: *** Waiting for unfinished jobs....

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Bernd Helmle
Date:

--On 14. November 2010 11:08:13 -0500 Robert Haas <robertmhaas@gmail.com> 
wrote:

> +1.  The current master branch fails to build on my (rather new) Mac with
> make -j2.  I could upgrade my toolchain but it seems like more trouble
> than it's worth, not to mention a possible obstacle to new users and
> developers.

The same here, too. And it doesn't matter if you use the shipped make 
(3.81) or the one from macports (currently 3.82), both are failing with:

ld: file not found: ../../../../../../src/backend/postgres
collect2: ld returned 1 exit status
make[3]: *** [ascii_and_mic.so] Error 1
make[2]: *** [all-ascii_and_mic-recurse] Error 2
make[1]: *** [all-backend/utils/mb/conversion_procs-recurse] Error 2
make[1]: *** Waiting for unfinished jobs....

-- 
Thanks
Bernd


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Peter Eisentraut
Date:
On mån, 2010-11-15 at 11:13 +0100, Bernd Helmle wrote:
>
> --On 14. November 2010 11:08:13 -0500 Robert Haas <robertmhaas@gmail.com>
> wrote:
>
> > +1.  The current master branch fails to build on my (rather new) Mac with
> > make -j2.  I could upgrade my toolchain but it seems like more trouble
> > than it's worth, not to mention a possible obstacle to new users and
> > developers.
>
> The same here, too. And it doesn't matter if you use the shipped make
> (3.81) or the one from macports (currently 3.82), both are failing with:
>
> ld: file not found: ../../../../../../src/backend/postgres
> collect2: ld returned 1 exit status
> make[3]: *** [ascii_and_mic.so] Error 1
> make[2]: *** [all-ascii_and_mic-recurse] Error 2
> make[1]: *** [all-backend/utils/mb/conversion_procs-recurse] Error 2
> make[1]: *** Waiting for unfinished jobs....

Untested, but the following should help you, by partially restoring the
old builder order on platforms that need it.


Attachment

Re: [COMMITTERS] pgsql: Improved parallel make support

From
Robert Haas
Date:
On Mon, Nov 15, 2010 at 4:10 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> ld: file not found: ../../../../../../src/backend/postgres
>> collect2: ld returned 1 exit status
>> make[3]: *** [ascii_and_mic.so] Error 1
>> make[2]: *** [all-ascii_and_mic-recurse] Error 2
>> make[1]: *** [all-backend/utils/mb/conversion_procs-recurse] Error 2
>> make[1]: *** Waiting for unfinished jobs....
>
> Untested, but the following should help you, by partially restoring the
> old builder order on platforms that need it.

Very odd, but this completely blew up the first time I tried it.

In file included from path.c:34:
pg_config_paths.h:2:11: error: missing terminating " character
In file included from path.c:34:
pg_config_paths.h:2: error: missing terminating " character
path.c:49: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’
before ‘static’

That file had a line in it that looked like this:

postgresql"

On a subsequent retry, I got:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -Werror  -bundle -multiply_defined suppress -o
dict_snowball.so dict_snowball.o api.o utilities.o
stem_ISO_8859_1_danish.o stem_ISO_8859_1_dutch.o
stem_ISO_8859_1_english.o stem_ISO_8859_1_finnish.o
stem_ISO_8859_1_french.o stem_ISO_8859_1_german.o
stem_ISO_8859_1_hungarian.o stem_ISO_8859_1_italian.o
stem_ISO_8859_1_norwegian.o stem_ISO_8859_1_porter.o
stem_ISO_8859_1_portuguese.o stem_ISO_8859_1_spanish.o
stem_ISO_8859_1_swedish.o stem_ISO_8859_2_romanian.o
stem_KOI8_R_russian.o stem_UTF_8_danish.o stem_UTF_8_dutch.o
stem_UTF_8_english.o stem_UTF_8_finnish.o stem_UTF_8_french.o
stem_UTF_8_german.o stem_UTF_8_hungarian.o stem_UTF_8_italian.o
stem_UTF_8_norwegian.o stem_UTF_8_porter.o stem_UTF_8_portuguese.o
stem_UTF_8_romanian.o stem_UTF_8_russian.o stem_UTF_8_spanish.o
stem_UTF_8_swedish.o stem_UTF_8_turkish.o -L../../../src/port
-L/opt/local/lib -Wl,-dead_strip_dylibs  -Werror  -bundle_loader
../../../src/backend/postgres
ld: file not found: ../../../src/backend/postgres
collect2: ld returned 1 exit status
make[2]: *** [dict_snowball.so] Error 1
make[1]: *** [all-backend/snowball-recurse] Error 2
make[1]: *** Waiting for unfinished jobs....

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Very odd, but this completely blew up the first time I tried it.

> In file included from path.c:34:
> pg_config_paths.h:2:11: error: missing terminating " character

FWIW, I didn't replicate that, but I did get this during one attempt
with -j4:

/usr/bin/ranlib: archive member: libecpg.a(typename.o) size too large (archive \
member extends past the end of the file)
ar: internal ranlib command failed
make[5]: *** [libecpg.a] Error 1
make[5]: *** Deleting file `libecpg.a'
make[4]: *** [submake-ecpglib] Error 2
make[3]: *** [all-compatlib-recurse] Error 2
make[3]: *** Waiting for unfinished jobs....
/usr/bin/ranlib: can't stat file output file: libecpg.a (No such file or direct\
ory)
ar: internal ranlib command failed
make[4]: *** [libecpg.a] Error 1
make[3]: *** [all-ecpglib-recurse] Error 2
make[2]: *** [all-ecpg-recurse] Error 2
make[1]: *** [all-interfaces-recurse] Error 2
make[1]: *** Waiting for unfinished jobs....
In file included from gram.y:12101:
scan.c: In function 'yy_try_NUL_trans':
scan.c:16242: warning: unused variable 'yyg'
make: *** [all-src-recurse] Error 2

Examination of the stdout trace makes it appear that two independent
make runs were trying to build src/interfaces/ecpg/ecpglib/libecpg.a
concurrently.  I haven't dug into it but I suspect that there are
multiple dependency chains leading to ecpg/ecpglib/.  I wonder whether
what you saw was also the result of multiple recursion paths leading
to trying to build the same target at once.  If so, that's going to
put a rather serious crimp in the idea of constraining build order
by adding more dependencies.

> On a subsequent retry, I got:
> ld: file not found: ../../../src/backend/postgres
> collect2: ld returned 1 exit status
> make[2]: *** [dict_snowball.so] Error 1

Yeah, I got that too, but adding all-backend/snowball-recurse to the
set of dependencies proposed in Peter's patch made it go away.
A cursory search for other appearances of -bundle_loader in the
make output suggests that contrib/ and src/test/regress/ are also
at risk.  This leads me to the thought that concentrating knowledge
of this issue in src/Makefile is not the right way to go at it.
And, again, the more paths leading to a make attempt in the same
directory, the worse off we are as far as the first problem goes.
But surely the "make" guys recognized this risk and have a solution?
Otherwise parallel make would be pretty useless.
        regards, tom lane


Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
I tried another experiment, which was "make -j100 all" on my relatively
new Linux box (2 dual-core CPUs).  It blew up real good, as per attached
stderr output, which shows evidence of more missing dependencies as well
as some additional cases of concurrent attempts to build the same
target.

It's clear to me that we are very far from having a handle on what it'll
really take to run parallel builds safely, and I am therefore now of the
opinion that we ought to revert the patch.  Hypothetical gains in
parallelism are useless if we can't actually use parallel building
reliably.  We are currently worse off than before in terms of time to
build the system.
        regards, tom lane

/usr/bin/ld: cannot find -lpgport
collect2: ld returned 1 exit status
make[3]: *** [refint.so] Error 1
make[2]: *** [../../../contrib/spi/refint.so] Error 2
make[2]: *** Waiting for unfinished jobs....
path.c: In function 'get_html_path':
path.c:615: error: 'HTMLDIR' undeclared (first use in this function)
path.c:615: error: (Each undeclared identifier is reported only once
path.c:615: error: for each function it appears in.)
path.c: In function 'get_man_path':
path.c:624: error: 'MANDIR' undeclared (first use in this function)
make[3]: *** [path.o] Error 1
make[3]: *** Deleting file `path.o'
make[3]: *** Waiting for unfinished jobs....
/usr/bin/ld: cannot find -lpgport
collect2: ld returned 1 exit status
make[3]: *** [autoinc.so] Error 1
make[2]: *** [../../../contrib/spi/autoinc.so] Error 2
make[2]: *** [submake-libpgport] Error 2
make[2]: *** Waiting for unfinished jobs....
ln: creating symbolic link `libpgtypes.so.3': File exists
make[4]: *** [libpgtypes.so.3.2] Error 1
make[4]: *** Deleting file `libpgtypes.so.3.2'
make[3]: *** [all-pgtypeslib-recurse] Error 2
make[3]: *** Waiting for unfinished jobs....
make[1]: *** [all-test/regress-recurse] Error 2
make[1]: *** Waiting for unfinished jobs....
In file included from gram.y:12102:
scan.c: In function 'yy_try_NUL_trans':
scan.c:16246: warning: unused variable 'yyg'
ln: creating symbolic link `libpq.so.5': File exists
make[4]: *** [libpq.so.5.4] Error 1
make[4]: *** Deleting file `libpq.so.5.4'
make[3]: *** [submake-libpq] Error 2
make[2]: *** [all-pg_dump-recurse] Error 2
make[2]: *** Waiting for unfinished jobs....
ln: creating symbolic link `libpq.so.5': File exists
make[6]: *** [libpq.so.5.4] Error 1
make[6]: *** Deleting file `libpq.so.5.4'
make[5]: *** [submake-libpq] Error 2
make[4]: *** [submake-ecpglib] Error 2
make[3]: *** [all-compatlib-recurse] Error 2
/usr/bin/ld: cannot open linker script file ../../../src/interfaces/libpq/libpq.so: No such file or directory
collect2: ld returned 1 exit status
make[3]: *** [psql] Error 1
make[2]: *** [all-psql-recurse] Error 2
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to `pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to `pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [createdb] Error 1
make[3]: *** Waiting for unfinished jobs....
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to `pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to `pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [createuser] Error 1
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to `pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to `pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [dropuser] Error 1
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to `pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to `pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [vacuumdb] Error 1
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to `pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to `pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [dropdb] Error 1
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to `pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to `pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [clusterdb] Error 1
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to `pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to `pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [reindexdb] Error 1
make[2]: *** [all-scripts-recurse] Error 2
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to `pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function `pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to `pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [pg_ctl] Error 1
make[2]: *** [all-pg_ctl-recurse] Error 2
make[1]: *** [all-bin-recurse] Error 2
make[2]: *** [all-ecpg-recurse] Error 2
make[1]: *** [all-interfaces-recurse] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2



Re: [COMMITTERS] pgsql: Improved parallel make support

From
Peter Eisentraut
Date:
On mån, 2010-11-15 at 23:34 -0500, Tom Lane wrote:
> It's clear to me that we are very far from having a handle on what
> it'll really take to run parallel builds safely, and I am therefore
> now of the opinion that we ought to revert the patch.  Hypothetical
> gains in parallelism are useless if we can't actually use parallel
> building reliably.  We are currently worse off than before in terms of
> time to build the system.

We don't have to revert it, we just have to insert .NOTPARALLEL targets
into some places that are not properly "parallelized", thus effectively
restoring the behavior of the old for loop.  I have attached a patch
that gets make -j 100+ working for me.  Other platforms might need more
things, perhaps.

Btw., my original notes for this development were labeled "make make -k
work properly".  So I would really like to keep that.  It just turned
out that parallel make could benefit from the same changes, and it's a
better marketing name. ;-)


Attachment

Re: [COMMITTERS] pgsql: Improved parallel make support

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On mån, 2010-11-15 at 23:34 -0500, Tom Lane wrote:
>> It's clear to me that we are very far from having a handle on what
>> it'll really take to run parallel builds safely, and I am therefore
>> now of the opinion that we ought to revert the patch.

> We don't have to revert it, we just have to insert .NOTPARALLEL targets
> into some places that are not properly "parallelized", thus effectively
> restoring the behavior of the old for loop.  I have attached a patch
> that gets make -j 100+ working for me.  Other platforms might need more
> things, perhaps.

If we don't have to revert it entirely, that's of course better.  Please
apply what you've got.
        regards, tom lane