Re: [HACKERS] Cutting initdb's runtime (Perl question embedded) - Mailing list pgsql-hackers

From ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Subject Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date
Msg-id d8jinm2qdhj.fsf@dalvik.ping.uio.no
Whole thread Raw
In response to Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Andres Freund <andres@anarazel.de> writes:
>> On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
>>> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
>
>> I'm a bit doubtful about improving the performance of genbki at the cost
>> of any sort of complication - it's only executed during the actual
>> build, not during initdb...  I don't see much point in doing things like
>> 3) and 4), it's just not worth it imo.
>
> Yeah, it's only run once per build, or probably even less often than that
> considering we only re-run it when the input files change.  I could get
> interested in another 20% reduction in initdb's time, but not genbki's.

Fair enough.  I still think 1), 2) and 5) are worthwile code cleanups
regardless of the performance impact.  In fact, if we don't care that
much about the performance, we can move the duplicated code in
Gen_fmgrmtab.pl and genbki.pl that turns bki_values into a hash into
Catalogs.pm.  That regresses genbki.pl time by ~30ms on my machine.

Revised patch series attached.

-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.               - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.                      - Calle Dybedahl

From f7b75bcbe993d4ddbc92da85c7148bf7fee143ee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 13:26:35 +0100
Subject: [PATCH 1/4] Avoid unnecessary regex captures in Catalog.pm

---
 src/backend/catalog/Catalog.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index fa8de04..e7b647a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -54,7 +54,7 @@ sub Catalogs
         {
 
             # Strip C-style comments.
-            s;/\*(.|\n)*\*/;;g;
+            s;/\*(?:.|\n)*\*/;;g;
             if (m;/\*;)
             {
 
@@ -80,12 +80,12 @@ sub Catalogs
             {
                 $catalog{natts} = $1;
             }
-            elsif (/^DATA\(insert(\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
+            elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
             {
-                check_natts($filename, $catalog{natts}, $3,
+                check_natts($filename, $catalog{natts}, $2,
                             $input_file, $input_line_number);
 
-                push @{ $catalog{data} }, { oid => $2, bki_values => $3 };
+                push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
             }
             elsif (/^DESCR\(\"(.*)\"\)$/)
             {
-- 
2.7.4

From decd1b8c171cc508da5f79f01d2f0779a569a963 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 14:04:20 +0100
Subject: [PATCH 2/4] Avoid repeated calls to Catalog::SplitDataLine

Both check_natts and the callers of Catalog::Catalogs were calling
Catalog::SplitDataLines, the former discarding the result.

Instead, have Catalog::Catalogs store the split fields directly and pass
the count to check_natts
---
 src/backend/catalog/Catalog.pm   | 14 +++++++-------
 src/backend/catalog/genbki.pl    |  3 +--
 src/backend/utils/Gen_fmgrtab.pl |  3 +--
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index e7b647a..81513c7 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -82,10 +82,12 @@ sub Catalogs
             }
             elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
             {
-                check_natts($filename, $catalog{natts}, $2,
+                my @bki_values = SplitDataLine($2);
+
+                check_natts($filename, $catalog{natts}, scalar(@bki_values),
                             $input_file, $input_line_number);
 
-                push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
+                push @{ $catalog{data} }, { oid => $1, bki_values => \@bki_values };
             }
             elsif (/^DESCR\(\"(.*)\"\)$/)
             {
@@ -254,17 +256,15 @@ sub RenameTempFile
 # verify the number of fields in the passed-in DATA line
 sub check_natts
 {
-    my ($catname, $natts, $bki_val, $file, $line) = @_;
+    my ($catname, $natts, $bki_vals, $file, $line) = @_;
 
     die "Could not find definition for Natts_${catname} before start of DATA() in $file\n"
         unless defined $natts;
 
-    my $nfields = scalar(SplitDataLine($bki_val));
-
     die sprintf
         "Wrong number of attributes in DATA() entry at %s:%d (expected %d but got %d)\n",
-        $file, $line, $natts, $nfields
-      unless $natts == $nfields;
+        $file, $line, $natts, $bki_vals
+      unless $natts == $bki_vals;
 }
 
 1;
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 198e072..8875f6c 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -161,9 +161,8 @@ foreach my $catname (@{ $catalogs->{names} })
         foreach my $row (@{ $catalog->{data} })
         {
 
-            # Split line into tokens without interpreting their meaning.
             my %bki_values;
-            @bki_values{@attnames} = Catalog::SplitDataLine($row->{bki_values});
+            @bki_values{@attnames} = @{$row->{bki_values}};
 
             # Perform required substitutions on fields
             foreach my $att (keys %bki_values)
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index 76bdf5c..d2c4617 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -58,9 +58,8 @@ foreach my $column (@{ $catalogs->{pg_proc}->{columns} })
 my $data = $catalogs->{pg_proc}->{data};
 foreach my $row (@$data)
 {
-    # Split line into tokens without interpreting their meaning.
     my %bki_values;
-    @bki_values{@attnames} = Catalog::SplitDataLine($row->{bki_values});
+    @bki_values{@attnames} = @{$row->{bki_values}};
 
     # Select out just the rows for internal-language procedures.
     # Note assumption here that INTERNALlanguageId is 12.
-- 
2.7.4

From a5b8e34bfab6a687d6b13293c27f1f1246bffacb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 15:19:33 +0100
Subject: [PATCH 3/4] Hashify bki values in Catlog::Catalogs

Both callers werere doing the same conversion of the bki_values array
into a hash, so do it just once in Catalog.pm
---
 src/backend/catalog/Catalog.pm   |  5 ++++-
 src/backend/catalog/genbki.pl    | 23 +++++++++++------------
 src/backend/utils/Gen_fmgrtab.pl | 18 ++++++------------
 3 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 81513c7..9c06ec1 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -87,7 +87,10 @@ sub Catalogs
                 check_natts($filename, $catalog{natts}, scalar(@bki_values),
                             $input_file, $input_line_number);
 
-                push @{ $catalog{data} }, { oid => $1, bki_values => \@bki_values };
+                my %bki_values;
+                @bki_values{map $_->{name}, @{$catalog{columns}}} = @bki_values;
+
+                push @{ $catalog{data} }, { oid => $oid, bki_values => \%bki_values };
             }
             elsif (/^DESCR\(\"(.*)\"\)$/)
             {
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 8875f6c..fba157e 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -161,24 +161,23 @@ foreach my $catname (@{ $catalogs->{names} })
         foreach my $row (@{ $catalog->{data} })
         {
 
-            my %bki_values;
-            @bki_values{@attnames} = @{$row->{bki_values}};
+            my $bki_values = $row->{bki_values};
 
             # Perform required substitutions on fields
-            foreach my $att (keys %bki_values)
+            foreach my $att (keys %$bki_values)
             {
                 # Substitute constant values we acquired above.
                 # (It's intentional that this can apply to parts of a field).
-                $bki_values{$att} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
-                $bki_values{$att} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
+                $bki_values->{$att} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
+                $bki_values->{$att} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
 
                 # Replace regproc columns' values with OIDs.
                 # If we don't have a unique value to substitute,
                 # just do nothing (regprocin will complain).
                 if ($bki_attr{$att}->{type} eq 'regproc')
                 {
-                    my $procoid = $regprocoids{$bki_values{$att}};
-                    $bki_values{$att} = $procoid
+                    my $procoid = $regprocoids{$bki_values->{$att}};
+                    $bki_values->{$att} = $procoid
                         if defined($procoid) && $procoid ne 'MULTIPLE';
                 }
             }
@@ -187,20 +186,20 @@ foreach my $catname (@{ $catalogs->{names} })
             # This relies on the order we process the files in!
             if ($catname eq 'pg_proc')
             {
-                if (defined($regprocoids{$bki_values{proname}}))
+                if (defined($regprocoids{$bki_values->{proname}}))
                 {
-                    $regprocoids{$bki_values{proname}} = 'MULTIPLE';
+                    $regprocoids{$bki_values->{proname}} = 'MULTIPLE';
                 }
                 else
                 {
-                    $regprocoids{$bki_values{proname}} = $row->{oid};
+                    $regprocoids{$bki_values->{proname}} = $row->{oid};
                 }
             }
 
             # Save pg_type info for pg_attribute processing below
             if ($catname eq 'pg_type')
             {
-                my %type = %bki_values;
+                my %type = %$bki_values;
                 $type{oid} = $row->{oid};
                 push @types, \%type;
             }
@@ -208,7 +207,7 @@ foreach my $catname (@{ $catalogs->{names} })
             # Write to postgres.bki
             my $oid = $row->{oid} ? "OID = $row->{oid} " : '';
             printf $bki "insert %s( %s )\n", $oid,
-              join(' ', @bki_values{@attnames});
+              join(' ', @{$bki_values}{@attnames});
 
             # Write comments to postgres.description and postgres.shdescription
             if (defined $row->{descr})
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index d2c4617..03a2ed5 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -49,28 +49,22 @@ my $catalogs = Catalog::Catalogs($infile);
 
 # Collect the raw data from pg_proc.h.
 my @fmgr = ();
-my @attnames;
-foreach my $column (@{ $catalogs->{pg_proc}->{columns} })
-{
-    push @attnames, $column->{name};
-}
 
 my $data = $catalogs->{pg_proc}->{data};
 foreach my $row (@$data)
 {
-    my %bki_values;
-    @bki_values{@attnames} = @{$row->{bki_values}};
+    my $bki_values = $row->{bki_values};
 
     # Select out just the rows for internal-language procedures.
     # Note assumption here that INTERNALlanguageId is 12.
-    next if $bki_values{prolang} ne '12';
+    next if $bki_values->{prolang} ne '12';
 
     push @fmgr,
       { oid    => $row->{oid},
-        strict => $bki_values{proisstrict},
-        retset => $bki_values{proretset},
-        nargs  => $bki_values{pronargs},
-        prosrc => $bki_values{prosrc}, };
+        strict => $bki_values->{proisstrict},
+        retset => $bki_values->{proretset},
+        nargs  => $bki_values->{pronargs},
+        prosrc => $bki_values->{prosrc}, };
 }
 
 # Emit headers for both files
-- 
2.7.4

From 036b1f6a01ac518501056b10b6dd011fda6815ed Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 15:57:18 +0100
Subject: [PATCH 4/4] Remove pointless Exporter usage in Catalog.pm

None of the users actually import any of the subroutines, they just call
them by their fully-qualified names.
---
 src/backend/catalog/Catalog.pm | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 9c06ec1..662df3b 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -16,11 +16,6 @@ package Catalog;
 use strict;
 use warnings;
 
-require Exporter;
-our @ISA       = qw(Exporter);
-our @EXPORT    = ();
-our @EXPORT_OK = qw(Catalogs SplitDataLine RenameTempFile);
-
 # Call this function with an array of names of header files to parse.
 # Returns a nested data structure describing the data in the headers.
 sub Catalogs
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] Passing values to a dynamic background worker
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] Quorum commit for multiple synchronous replication.