Thread: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Stefan Kaltenbrunner

Date:

05 January 2010, 13:25:05

Tom Lane wrote:
> Log Message:
> -----------
> Get rid of the need for manual maintenance of the initial contents of
> pg_attribute, by having genbki.pl derive the information from the various
> catalog header files.  This greatly simplifies modification of the
> "bootstrapped" catalogs.
> 
> This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely entirely on
> Perl scripts for those build steps.  To avoid creating a Perl build dependency
> where there was not one before, the output files generated by these scripts
> are now treated as distprep targets, ie, they will be built and shipped in
> tarballs.  But you will need a reasonably modern Perl (probably at least
> 5.6) if you want to build from a CVS pull.

this broke the build on spoonbill:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2010-01-05%2015:05:08

manually executing the command shows that the perl process eats more 
than 250MB of RAM at closely afterwards fails with an out of memory due 
to hitting the process limit on that box.
I don't think that is in any way sane :)


# perl -v
This is perl, v5.8.8 built for sparc64-openbsd


Stefan

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Robert Haas

Date:

05 January 2010, 13:40:17

On Tue, Jan 5, 2010 at 12:24 PM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
> Tom Lane wrote:
>>
>> Log Message:
>> -----------
>> Get rid of the need for manual maintenance of the initial contents of
>> pg_attribute, by having genbki.pl derive the information from the various
>> catalog header files.  This greatly simplifies modification of the
>> "bootstrapped" catalogs.
>>
>> This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely
>> entirely on
>> Perl scripts for those build steps.  To avoid creating a Perl build
>> dependency
>> where there was not one before, the output files generated by these
>> scripts
>> are now treated as distprep targets, ie, they will be built and shipped in
>> tarballs.  But you will need a reasonably modern Perl (probably at least
>> 5.6) if you want to build from a CVS pull.
>
> this broke the build on spoonbill:
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2010-01-05%2015:05:08
>
> manually executing the command shows that the perl process eats more than
> 250MB of RAM at closely afterwards fails with an out of memory due to
> hitting the process limit on that box.
> I don't think that is in any way sane :)
>
>
> # perl -v
> This is perl, v5.8.8 built for sparc64-openbsd

I just tried this with ulimit -v 131072 and it worked.  At 65536 and
32768 it emited an error about being unable to set the locale but
still seemed to run.  At 32768 it couldn't load all its shared
libraries any more so it croaked, but with a different error message.

Can we get the output of ulimit -a on that machine?

Is there by any chance some other, conflicting Catalog.pm on that machine?

...Robert

Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Tom Lane

Date:

05 January 2010, 13:40:18

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> this broke the build on spoonbill:

> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2010-01-05%2015:05:08

> manually executing the command shows that the perl process eats more 
> than 250MB of RAM at closely afterwards fails with an out of memory due 
> to hitting the process limit on that box.
> I don't think that is in any way sane :)

Bizarre.  Gen_fmgrtab.pl doesn't eat anywhere near that much RAM here.
Anybody else seeing the same?
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Stefan Kaltenbrunner

Date:

05 January 2010, 13:49:50

Robert Haas wrote:
> On Tue, Jan 5, 2010 at 12:24 PM, Stefan Kaltenbrunner
> <stefan@kaltenbrunner.cc> wrote:
>> Tom Lane wrote:
>>> Log Message:
>>> -----------
>>> Get rid of the need for manual maintenance of the initial contents of
>>> pg_attribute, by having genbki.pl derive the information from the various
>>> catalog header files.  This greatly simplifies modification of the
>>> "bootstrapped" catalogs.
>>>
>>> This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely
>>> entirely on
>>> Perl scripts for those build steps.  To avoid creating a Perl build
>>> dependency
>>> where there was not one before, the output files generated by these
>>> scripts
>>> are now treated as distprep targets, ie, they will be built and shipped in
>>> tarballs.  But you will need a reasonably modern Perl (probably at least
>>> 5.6) if you want to build from a CVS pull.
>> this broke the build on spoonbill:
>>
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2010-01-05%2015:05:08
>>
>> manually executing the command shows that the perl process eats more than
>> 250MB of RAM at closely afterwards fails with an out of memory due to
>> hitting the process limit on that box.
>> I don't think that is in any way sane :)
>>
>>
>> # perl -v
>> This is perl, v5.8.8 built for sparc64-openbsd
> 
> I just tried this with ulimit -v 131072 and it worked.  At 65536 and
> 32768 it emited an error about being unable to set the locale but
> still seemed to run.  At 32768 it couldn't load all its shared
> libraries any more so it croaked, but with a different error message.
> 
> Can we get the output of ulimit -a on that machine?

$ ulimit -a
time(cpu-seconds)    unlimited
file(blocks)         unlimited
coredump(blocks)     unlimited
data(kbytes)         524288
stack(kbytes)        4096
lockedmem(kbytes)    334589
memory(kbytes)       1000456
nofiles(descriptors) 128
processes            64


> 
> Is there by any chance some other, conflicting Catalog.pm on that machine?

as I said I can reproduce it manually withe the Catalog.pm from the 
failing build as well.
I can succeed building it using the root account but that runs the box 
more or less out of memory as it eats up to ~550MB RSS and 990MB of SIZE...



Stefan

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Tom Lane

Date:

05 January 2010, 13:52:04

Robert Haas <robertmhaas@gmail.com> writes:
> Is there by any chance some other, conflicting Catalog.pm on that machine?

Even if there is, shouldn't the use of "perl -I" ensure our version gets
loaded?

Also, Stefan's log shows that it got through genbki.pl, so whatever the
problem is, I don't think it's Catalog.pm's fault.
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Robert Haas

Date:

05 January 2010, 14:11:18

On Tue, Jan 5, 2010 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Is there by any chance some other, conflicting Catalog.pm on that machine?
>
> Even if there is, shouldn't the use of "perl -I" ensure our version gets
> loaded?

I would certainly think so.

> Also, Stefan's log shows that it got through genbki.pl, so whatever the
> problem is, I don't think it's Catalog.pm's fault.

Beats me.  I don't have any other ideas at the moment.

...Robert

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Tom Lane

Date:

05 January 2010, 14:18:44

Robert Haas <robertmhaas@gmail.com> writes:
> Beats me.  I don't have any other ideas at the moment.

Stefan, could you try adding some debugging printouts to the
Gen_fmgrtab.pl script to see how far it gets before blowing up?
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Stefan Kaltenbrunner

Date:

05 January 2010, 14:44:34

Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Beats me.  I don't have any other ideas at the moment.
> 
> Stefan, could you try adding some debugging printouts to the
> Gen_fmgrtab.pl script to see how far it gets before blowing up?

did that and it seems the problem is in the loop that does:

foreach my $row (@$data)
{
    # To construct fmgroids.h and fmgrtab.c, we need to inspect some    # of the individual data fields.  Just
splittingon whitespace    # won't work, because some quoted fields might contain internal    # whitespace.  We handle
thisby folding them all to a simple    # "xxx". Fortunately, this script doesn't need to look at any    # fields that
mightneed quoting, so this simple hack is    # sufficient.
 

...
}

it does after around 1050 iterations of that loop at it seems to leak 
exactly 240kbyte per iteration which sums up to around 250MB in total 
for the process...


Stefan

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Tom Lane

Date:

05 January 2010, 15:03:07

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> did that and it seems the problem is in the loop that does:

> foreach my $row (@$data)
> {

>      # To construct fmgroids.h and fmgrtab.c, we need to inspect some
>      # of the individual data fields.  Just splitting on whitespace

Huh.  It's weird that I don't see a leak in either 5.8.7 or 5.10.1,
which are the two closest perl versions I have handy here.  I think
this may be a platform-specific Perl bug.  Still, it would be nice
to work around it if we can.

I'm not nearly good enough in Perl to be sure about the semantics
of this loop.  Is it possible that it's changing the global contents
of the @$data structure, rather than just hacking a local copy of
each row before pushing some values into @fmgr?
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Alvaro Herrera

Date:

05 January 2010, 15:04:19

Stefan Kaltenbrunner escribió:

> foreach my $row (@$data)
> {
> 
>     # To construct fmgroids.h and fmgrtab.c, we need to inspect some
>     # of the individual data fields.  Just splitting on whitespace
>     # won't work, because some quoted fields might contain internal
>     # whitespace.  We handle this by folding them all to a simple
>     # "xxx". Fortunately, this script doesn't need to look at any
>     # fields that might need quoting, so this simple hack is
>     # sufficient.
> 
> ...
> }
> 
> it does after around 1050 iterations of that loop at it seems to
> leak exactly 240kbyte per iteration which sums up to around 250MB in
> total for the process...

Hmm, is this running some old Perl version?  Perhaps it's not freeing
memory timely ... maybe unsetting/deleting $row after each iteration?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Stefan Kaltenbrunner

Date:

05 January 2010, 15:23:42

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> did that and it seems the problem is in the loop that does:
>
>> foreach my $row (@$data)
>> {
>
>>      # To construct fmgroids.h and fmgrtab.c, we need to inspect some
>>      # of the individual data fields.  Just splitting on whitespace
>
> Huh.  It's weird that I don't see a leak in either 5.8.7 or 5.10.1,
> which are the two closest perl versions I have handy here.  I think
> this may be a platform-specific Perl bug.  Still, it would be nice
> to work around it if we can.

yeah it is probably some strange platform specific issue - however with
some help from the IRC channel I came up with the following patch that
considerable reduces the memory bloat (but still needs ~55MB max) and
allows the build to succeed.


>
> I'm not nearly good enough in Perl to be sure about the semantics
> of this loop.  Is it possible that it's changing the global contents
> of the @$data structure, rather than just hacking a local copy of
> each row before pushing some values into @fmgr?

hmm it is leaking a constant amount of 240kbyte per loop iteration with
the original code but yeah very weird issue...


Stefan
Index: src/backend/utils/Gen_fmgrtab.pl
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/Gen_fmgrtab.pl,v
retrieving revision 1.4
diff -u -r1.4 Gen_fmgrtab.pl
--- src/backend/utils/Gen_fmgrtab.pl    5 Jan 2010 01:06:56 -0000    1.4
+++ src/backend/utils/Gen_fmgrtab.pl    5 Jan 2010 19:14:10 -0000
@@ -56,7 +56,7 @@
 }

 my $data = $catalogs->{pg_proc}->{data};
-foreach my $row (@$data)
+while( my $row = shift @$data )
 {

     # To construct fmgroids.h and fmgrtab.c, we need to inspect some

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Tom Lane

Date:

05 January 2010, 15:32:01

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> yeah it is probably some strange platform specific issue - however with 
> some help from the IRC channel I came up with the following patch that
> considerable reduces the memory bloat (but still needs ~55MB max) and 
> allows the build to succeed.

What about Alvaro's idea of adding
$row = undef;

at the bottom of the loop?  That seems marginally less ugly...
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Tom Lane

Date:

05 January 2010, 15:42:22

Garick Hamlin <ghamlin@isc.upenn.edu> writes:
> On Tue, Jan 05, 2010 at 02:02:51PM -0500, Tom Lane wrote:
>> I'm not nearly good enough in Perl to be sure about the semantics
>> of this loop.  Is it possible that it's changing the global contents
>> of the @$data structure, rather than just hacking a local copy of
>> each row before pushing some values into @fmgr?

> Yes, that is what would happen.  
> Is that a problem? (I haven't read the script)

Well, it *shouldn't* be a problem, but what it looks like to me is
that Stefan's perl version is somehow leaking one copy of the whole
$data structure each time through the loop.  If we were to copy and
then modify each row individually, maybe that'd dodge the bug.
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Garick Hamlin

Date:

05 January 2010, 15:43:17

On Tue, Jan 05, 2010 at 02:02:51PM -0500, Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> > did that and it seems the problem is in the loop that does:
> 
> > foreach my $row (@$data)
> > {
> 
> >      # To construct fmgroids.h and fmgrtab.c, we need to inspect some
> >      # of the individual data fields.  Just splitting on whitespace
> 
> Huh.  It's weird that I don't see a leak in either 5.8.7 or 5.10.1,
> which are the two closest perl versions I have handy here.  I think
> this may be a platform-specific Perl bug.  Still, it would be nice
> to work around it if we can.
> 
> I'm not nearly good enough in Perl to be sure about the semantics
> of this loop.  Is it possible that it's changing the global contents
> of the @$data structure, rather than just hacking a local copy of
> each row before pushing some values into @fmgr?
Yes, that is what would happen.  
I have not read this script at all but that is how perl works...

$row is an aliased to each scalar in the list '(@$data)'

There is no copying.  Copying happens on list assignment, 
but not with for/foreach.

If you wanted a copy you need something like:
foreach my $row (@{[ @$data ]}) {

for a shallow copy.

Is that a problem? (I haven't read the script)

Garick

> 
>             regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Stefan Kaltenbrunner

Date:

05 January 2010, 15:44:26

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> yeah it is probably some strange platform specific issue - however with 
>> some help from the IRC channel I came up with the following patch that
>> considerable reduces the memory bloat (but still needs ~55MB max) and 
>> allows the build to succeed.
> 
> What about Alvaro's idea of adding
> 
>     $row = undef;
> 
> at the bottom of the loop?  That seems marginally less ugly...

not sure I want to argue about the ugliness of perl but that has the 
same effect as my patch so either way would do to get spoonbill green again.

Stefan

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Tom Lane

Date:

05 January 2010, 16:24:13

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> What about Alvaro's idea of adding
>> $row = undef;
>> at the bottom of the loop?  That seems marginally less ugly...

> not sure I want to argue about the ugliness of perl but that has the 
> same effect as my patch so either way would do to get spoonbill green again.

OK, done.
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Andrew Dunstan

Date:

05 January 2010, 16:36:43


Tom Lane wrote:
> I'm not nearly good enough in Perl to be sure about the semantics
> of this loop.  Is it possible that it's changing the global contents
> of the @$data structure, rather than just hacking a local copy of
> each row before pushing some values into @fmgr?
>   

That's exactly what it does. The loop variable is an alias. See 
perlsyn(1) for details.

These two lines appear to be suspicious:
   $row->{bki_values} =~ s/"[^"]*"/"xxx"/g;   @{$row}{@attnames} = split /\s+/, $row->{bki_values};

Something like:
   (my $bkival = $row->{bki_values}) =~ s/"[^"]*"/"xxx"/g;   my $atts = {};   @{$atts}{@attnames} = split /\s+/,
$bkival;

might work better.

cheers

andrew

Re: Re: [COMMITTERS] pgsql: Get rid of the need for manual maintenance of the initial

From

Tom Lane

Date:

05 January 2010, 16:48:54

Andrew Dunstan <andrew@dunslane.net> writes:
> Something like:
>     (my $bkival = $row->{bki_values}) =~ s/"[^"]*"/"xxx"/g;
>     my $atts = {};
>     @{$atts}{@attnames} = split /\s+/, $bkival;
> might work better.

I committed Alvaro's suggestion (undef at the bottom), but feel
free to clean it up if you like this way better.
        regards, tom lane