Thread: Reproducible builds: genbki.pl vs schemapg.h

Reproducible builds: genbki.pl vs schemapg.h

From
Christoph Berg
Date:
Hi,

Debian's reproducible builds project has revealed that the full build
path gets embedded into server/catalog/schemapg.h:

/*-------------------------------------------------------------------------
 *
 * schemapg.h
 *    Schema_pg_xxx macros for use by relcache.c
 *
 * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *
 * NOTES
 *  ******************************
 *  *** DO NOT EDIT THIS FILE! ***
 *  ******************************
 *
 *  It has been GENERATED by /build/postgresql-11-05gRdu/build/../src/backend/catalog/genbki.pl
 *
 *-------------------------------------------------------------------------
 */

This information will vary on rebuilding the binaries, needlessly
causing different checksums.

I'm proposing to strip that down to "It has been GENERATED genbki.pl".
Patch attached.

Christoph

Attachment

Re: Reproducible builds: genbki.pl vs schemapg.h

From
Tom Lane
Date:
Christoph Berg <christoph.berg@credativ.de> writes:
> Debian's reproducible builds project has revealed that the full build
> path gets embedded into server/catalog/schemapg.h:

genbki.pl is hardly our only script that prints its $0 ...

            regards, tom lane


Re: Reproducible builds: genbki.pl vs schemapg.h

From
Christoph Berg
Date:
Re: Tom Lane 2017-12-15 <9616.1513351766@sss.pgh.pa.us>
> Christoph Berg <christoph.berg@credativ.de> writes:
> > Debian's reproducible builds project has revealed that the full build
> > path gets embedded into server/catalog/schemapg.h:
> 
> genbki.pl is hardly our only script that prints its $0 ...

As per
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/postgresql-10.html,
that's the only place that makes it into the resulting binary.
I wouldn't be sending a patch if it didn't fix the issue.

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE


Re: Reproducible builds: genbki.pl vs schemapg.h

From
Michael Paquier
Date:
On Sat, Dec 16, 2017 at 3:13 AM, Christoph Berg
<christoph.berg@credativ.de> wrote:
> Re: Tom Lane 2017-12-15 <9616.1513351766@sss.pgh.pa.us>
>> Christoph Berg <christoph.berg@credativ.de> writes:
>> > Debian's reproducible builds project has revealed that the full build
>> > path gets embedded into server/catalog/schemapg.h:
>>
>> genbki.pl is hardly our only script that prints its $0 ...
>
> As per
> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/postgresql-10.html,
> that's the only place that makes it into the resulting binary.
> I wouldn't be sending a patch if it didn't fix the issue.

Why not fixing that? Reproducible builds are a trend of these days,
and what's proposed here is really simple to make PG more compliant
with this purpose in mind.
-- 
Michael


Re: Reproducible builds: genbki.pl vs schemapg.h

From
Andres Freund
Date:
On 2017-12-16 07:52:41 +0900, Michael Paquier wrote:
> On Sat, Dec 16, 2017 at 3:13 AM, Christoph Berg
> <christoph.berg@credativ.de> wrote:
> > Re: Tom Lane 2017-12-15 <9616.1513351766@sss.pgh.pa.us>
> >> Christoph Berg <christoph.berg@credativ.de> writes:
> >> > Debian's reproducible builds project has revealed that the full build
> >> > path gets embedded into server/catalog/schemapg.h:
> >>
> >> genbki.pl is hardly our only script that prints its $0 ...
> >
> > As per
> > https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/postgresql-10.html,
> > that's the only place that makes it into the resulting binary.
> > I wouldn't be sending a patch if it didn't fix the issue.
> 
> Why not fixing that? Reproducible builds are a trend of these days,
> and what's proposed here is really simple to make PG more compliant
> with this purpose in mind.

It's not like $0 instead of a hardcoded name in the header actually buys
us anything afaict.

Greetings,

Andres Freund


Re: Reproducible builds: genbki.pl vs schemapg.h

From
Peter Geoghegan
Date:
On Fri, Dec 15, 2017 at 3:21 PM, Andres Freund <andres@anarazel.de> wrote:
>> Why not fixing that? Reproducible builds are a trend of these days,
>> and what's proposed here is really simple to make PG more compliant
>> with this purpose in mind.
>
> It's not like $0 instead of a hardcoded name in the header actually buys
> us anything afaict.

+1.

I think that reproducible builds are a worthwhile goal, and I welcome
Christoph's continued work on them.

-- 
Peter Geoghegan


Re: Reproducible builds: genbki.pl vs schemapg.h

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-12-16 07:52:41 +0900, Michael Paquier wrote:
>> On Sat, Dec 16, 2017 at 3:13 AM, Christoph Berg
>> <christoph.berg@credativ.de> wrote:
>>> Re: Tom Lane 2017-12-15 <9616.1513351766@sss.pgh.pa.us>
>>>> genbki.pl is hardly our only script that prints its $0 ...

>>> As per
>>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/postgresql-10.html,
>>> that's the only place that makes it into the resulting binary.

I'm fairly confused by this claim.  Since the string in question is in a
comment, it really shouldn't affect built binaries at all.  I can believe
that it would affect the non-binary contents of the finished package,
because we include schemapg.h as one of the installed headers.  However,
we also include fmgroids.h, which also has an expanded version of $0
in it.  So if schemapg.h is affected by build path, why isn't fmgroids.h?

In my build, neither one of these files contains any path information;
I speculate that you need to use a VPATH build to have an issue, or
maybe Debian's build environment does something even weirder.  But
I feel confident in saying that if indeed fmgroids.h is invariant for
you today, that's a phase-of-the-moon behavior that will break someday
if we don't make a similar change in Gen_fmgrtab.pl.  Or else we should
propagate what's preventing it back to genbki.pl.

plperl's text2macro.pl is also emitting $0, and there may be other places
I missed (I grepped for 'DO NOT EDIT', not for $0 per se).  We seem not
to install the output file of that one, perlchunks.h, but that's a
decision that somebody might change too.

> It's not like $0 instead of a hardcoded name in the header actually buys
> us anything afaict.

Agreed so far as the script name goes.  However, two out of three of these
scripts also print their input file names, and I'm suspicious that that
output is also gonna change in a VPATH build.  I'm a little less inclined
to buy the claim that we're not losing anything if we suppress that :-(

            regards, tom lane


Re: Reproducible builds: genbki.pl vs schemapg.h

From
Christoph Berg
Date:
Re: Tom Lane 2017-12-16 <5525.1513381145@sss.pgh.pa.us>
> >>> As per
> >>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/postgresql-10.html,
> >>> that's the only place that makes it into the resulting binary.
> 
> I'm fairly confused by this claim.  Since the string in question is in a
> comment, it really shouldn't affect built binaries at all.  I can believe
> that it would affect the non-binary contents of the finished package,

"Binary" in that context was the .deb package file (in contrast to the
.dsc source package).

> In my build, neither one of these files contains any path information;
> I speculate that you need to use a VPATH build to have an issue, or
> maybe Debian's build environment does something even weirder.

This is a VPATH build, yes.

> > It's not like $0 instead of a hardcoded name in the header actually buys
> > us anything afaict.
> 
> Agreed so far as the script name goes.  However, two out of three of these
> scripts also print their input file names, and I'm suspicious that that
> output is also gonna change in a VPATH build.  I'm a little less inclined
> to buy the claim that we're not losing anything if we suppress that :-(

Well, patching this instance of $0 would fix a binary-package
variation in practise. Of course there might be more issues waiting to
come into effect, but I don't see why that would be an argument
against fixing the current issue.

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE


Re: Reproducible builds: genbki.pl vs schemapg.h

From
Tom Lane
Date:
Christoph Berg <christoph.berg@credativ.de> writes:
>> Agreed so far as the script name goes.  However, two out of three of these
>> scripts also print their input file names, and I'm suspicious that that
>> output is also gonna change in a VPATH build.  I'm a little less inclined
>> to buy the claim that we're not losing anything if we suppress that :-(

> Well, patching this instance of $0 would fix a binary-package
> variation in practise. Of course there might be more issues waiting to
> come into effect, but I don't see why that would be an argument
> against fixing the current issue.

I think we're talking at cross-purposes.  I'm not saying we should not fix
this problem.  I'm saying that the proposed fix appears incomplete, which
means that (a) even if it solves your problem, it probably does not solve
related problems for other people; (b) since it's not clear why this
patch is apparently sufficient for you, I'd like to understand that in
some detail before deeming the problem solved; and (c) leaving instances
of the problematic code in our tree is just about guaranteed to mean
you'll have the same problem in future, when somebody either copies that
coding pattern into some new script or tweaks the way those existing
scripts are being used.

            regards, tom lane


Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Christoph Berg
Date:
Re: Tom Lane 2017-12-16 <417.1513438031@sss.pgh.pa.us>
> I think we're talking at cross-purposes.  I'm not saying we should not fix
> this problem.  I'm saying that the proposed fix appears incomplete, which
> means that (a) even if it solves your problem, it probably does not solve
> related problems for other people; (b) since it's not clear why this
> patch is apparently sufficient for you, I'd like to understand that in
> some detail before deeming the problem solved; and (c) leaving instances
> of the problematic code in our tree is just about guaranteed to mean
> you'll have the same problem in future, when somebody either copies that
> coding pattern into some new script or tweaks the way those existing
> scripts are being used.

Grepping through the source, there are three places where $0 printed
to files in regular operation (as opposed to being used in --help):

* src/backend/catalog/genbki.pl
  *  It has been GENERATED by $0

* src/backend/utils/Gen_fmgrtab.pl
  *     It has been GENERATED by $0

* src/backend/utils/mb/Unicode/convutils.pm, a perl module used by
  src/backend/utils/mb/Unicode/*_to_*.pl
  my $this_script = $0;
  print $out "/* This file is generated by $this_script */\n\n";

The latter is not relevant because the generated .map files just
included in other .c files, but not shipped in source. (And a changed
comment does not modify the build result.)

The first case is fixed in my original patch, the updated patch
catches the second as well.

I believe the reason why we've only been seeing half of the problem
yet is that the generated files are shipped with the tarballs, so it
might be a timestamping issue determining if the scripts are
re-executed.

Filed as https://commitfest.postgresql.org/16/1417/

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE

Attachment

Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Tom Lane
Date:
Christoph Berg <christoph.berg@credativ.de> writes:
> Re: Tom Lane 2017-12-16 <417.1513438031@sss.pgh.pa.us>
>> I think we're talking at cross-purposes.  I'm not saying we should not fix
>> this problem.  I'm saying that the proposed fix appears incomplete ...

> Grepping through the source, there are three places where $0 printed
> to files in regular operation (as opposed to being used in --help):

I poked around and found a few more.

> I believe the reason why we've only been seeing half of the problem
> yet is that the generated files are shipped with the tarballs, so it
> might be a timestamping issue determining if the scripts are
> re-executed.

Right; some parts of this problem would only materialize for you if you
needed to rebuild the generated files that are included in the tarball,
which should basically not be happening in normal packager builds.
Rather the risk is at our end: if we ever switched the tarball creation
process to be a VPATH build, then there'd be path dependencies in the
created tarballs.  That would be bad.

More generally, my concern here is not just that we fix this problem
but that it stays fixed.  If some individual scripts print $0 into
their output and it happens to not affect any built distribution files
today, it's still bad, because tomorrow somebody might copy that coding
pattern into someplace else where it matters more.  I think we need a
project policy that thou shalt not print $0 into generated files, period.

Also, experimenting with a VPATH build, I verified that such "helpful"
practices as printing $infile or @ARGV into the output file will also
create path dependencies.  So I think we need to lose those too.
It's not like they're adding any info you can't find out from the
Makefiles.

On the other hand, there is something we can do that will improve
matters: rather than just printing the base name of the script,
let's print its full relative path within the PG sources, eg instead
of Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl.
My thought here is that if you're not already intimately familiar
with a script you might not remember where it lives, the more so
if you're looking at a file that's been put into an installation
tree far away from where it was generated.  I see that this policy
was already followed in some places, just not in the ones that were
using the $0 shortcut.

In short, I propose the attached more-extensive patch.

Some of the files generated by these scripts, particularly the map
files generated by the src/backend/utils/mb/Unicode/ scripts, are
not just present in tarballs but are actually in our git repo.
So changing those scripts won't affect anything until/unless someone
updates the repo's generated files, which I've not done here and
don't feel much need to do.  I just want to establish a principle
that we don't print path-dependent info into generated files.

            regards, tom lane

diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 256a9c9..e4a0b8b 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -340,7 +340,7 @@ print $schemapg <<EOM;
  *  *** DO NOT EDIT THIS FILE! ***
  *  ******************************
  *
- *  It has been GENERATED by $0
+ *  It has been GENERATED by src/backend/catalog/genbki.pl
  *
  *-------------------------------------------------------------------------
  */
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index ee89d50..26b428b 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -118,8 +118,7 @@ qq|/*-------------------------------------------------------------------------
  *    *** DO NOT EDIT THIS FILE! ***
  *    ******************************
  *
- *    It has been GENERATED by $0
- *    from $infile
+ *    It has been GENERATED by src/backend/utils/Gen_fmgrtab.pl
  *
  *-------------------------------------------------------------------------
  */
@@ -153,8 +152,7 @@ qq|/*-------------------------------------------------------------------------
  *    *** DO NOT EDIT THIS FILE! ***
  *    ******************************
  *
- *    It has been GENERATED by $0
- *    from $infile
+ *    It has been GENERATED by src/backend/utils/Gen_fmgrtab.pl
  *
  *-------------------------------------------------------------------------
  */
@@ -181,8 +179,7 @@ qq|/*-------------------------------------------------------------------------
  *    *** DO NOT EDIT THIS FILE! ***
  *    ******************************
  *
- *    It has been GENERATED by $0
- *    from $infile
+ *    It has been GENERATED by src/backend/utils/Gen_fmgrtab.pl
  *
  *-------------------------------------------------------------------------
  */
diff --git a/src/backend/utils/mb/Unicode/UCS_to_BIG5.pl b/src/backend/utils/mb/Unicode/UCS_to_BIG5.pl
index 05822b2..f73e2a1 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_BIG5.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_BIG5.pl
@@ -27,7 +27,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_BIG5.pl';

 # Load BIG5.TXT
 my $all = &read_source("BIG5.TXT");
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
index f7f9496..63f5e1b 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
@@ -16,7 +16,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl';

 # Read the input

diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
index 2290031..4dac7d2 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
@@ -10,7 +10,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl';

 # first generate UTF-8 --> EUC_JIS_2004 table

diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
index 5db663f..77ce273 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
@@ -14,7 +14,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl';

 # Load JIS0212.TXT
 my $jis0212 = &read_source("JIS0212.TXT");
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
index f0d978d..8e0126c 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
@@ -19,7 +19,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl';

 # Load the source file.

diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
index 0bfcbd5..61013e3 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
@@ -20,7 +20,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl';

 my $mapping = &read_source("CNS11643.TXT");

diff --git a/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl b/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
index 4469cc7..e9f816c 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
@@ -16,7 +16,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_GB18030.pl';

 # Read the input

diff --git a/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl b/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl
index 7c6f526..be10d52 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl
@@ -18,7 +18,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl';

 # Load the source file.

diff --git a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
index d1b36c0..98a6ee7 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
@@ -12,7 +12,7 @@ use convutils;

 # first generate UTF-8 --> SHIFT_JIS_2004 table

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl';

 my $in_file = "sjis-0213-2004-std.txt";

diff --git a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
index 7460964..cc1edcc 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
@@ -13,7 +13,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_SJIS.pl';

 my $mapping = read_source("CP932.TXT");

diff --git a/src/backend/utils/mb/Unicode/UCS_to_UHC.pl b/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
index 8d99ca3..640a2ec 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
@@ -16,7 +16,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_UHC.pl';

 # Read the input

diff --git a/src/backend/utils/mb/Unicode/UCS_to_most.pl b/src/backend/utils/mb/Unicode/UCS_to_most.pl
index 1c3922e..2d69748 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_most.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_most.pl
@@ -18,7 +18,7 @@
 use strict;
 use convutils;

-my $this_script = $0;
+my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_most.pl';

 my %filename = (
     'WIN866'     => 'CP866.TXT',
diff --git a/src/bin/psql/create_help.pl b/src/bin/psql/create_help.pl
index cedb767..9fa1855 100644
--- a/src/bin/psql/create_help.pl
+++ b/src/bin/psql/create_help.pl
@@ -51,8 +51,7 @@ print $hfile_handle "/*
  * *** Do not change this file by hand. It is automatically
  * *** generated from the DocBook documentation.
  *
- * generated by
- *     $^X $0 @ARGV
+ * generated by src/bin/psql/create_help.pl
  *
  */

@@ -76,8 +75,7 @@ print $cfile_handle "/*
  * *** Do not change this file by hand. It is automatically
  * *** generated from the DocBook documentation.
  *
- * generated by
- *     $^X $0 @ARGV
+ * generated by src/bin/psql/create_help.pl
  *
  */

@@ -131,7 +129,7 @@ foreach my $file (sort readdir DIR)
         my $nl_count = () = $cmdsynopsis =~ /\n/g;

         $cmdsynopsis =~ m!</>!
-          and die "$0:$file: null end tag not supported in synopsis\n";
+          and die "$0: $file: null end tag not supported in synopsis\n";
         $cmdsynopsis =~ s/%/%%/g;

         while ($cmdsynopsis =~ m!<(\w+)[^>]*>(.+?)</\1[^>]*>!)
diff --git a/src/pl/plperl/text2macro.pl b/src/pl/plperl/text2macro.pl
index e681fca..27c6ef7 100644
--- a/src/pl/plperl/text2macro.pl
+++ b/src/pl/plperl/text2macro.pl
@@ -40,7 +40,7 @@ die "No text files specified"
 print qq{
 /*
  * DO NOT EDIT - THIS FILE IS AUTOGENERATED - CHANGES WILL BE LOST
- * Written by $0 from @ARGV
+ * Generated by src/pl/plperl/text2macro.pl
  */
 };

diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 6019040..72826d5 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -66,7 +66,7 @@ BEGIN
     delete $ENV{PGPORT};
     delete $ENV{PGHOST};

-    $ENV{PGAPPNAME} = $0;
+    $ENV{PGAPPNAME} = basename($0);

     # Must be set early
     $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys';

Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Andrew Dunstan
Date:

On 12/20/2017 09:50 PM, Tom Lane wrote:
> Christoph Berg <christoph.berg@credativ.de> writes:
>> Re: Tom Lane 2017-12-16 <417.1513438031@sss.pgh.pa.us>
>>> I think we're talking at cross-purposes.  I'm not saying we should not fix
>>> this problem.  I'm saying that the proposed fix appears incomplete ...
>> Grepping through the source, there are three places where $0 printed
>> to files in regular operation (as opposed to being used in --help):
> I poked around and found a few more.
>
>> I believe the reason why we've only been seeing half of the problem
>> yet is that the generated files are shipped with the tarballs, so it
>> might be a timestamping issue determining if the scripts are
>> re-executed.
> Right; some parts of this problem would only materialize for you if you
> needed to rebuild the generated files that are included in the tarball,
> which should basically not be happening in normal packager builds.
> Rather the risk is at our end: if we ever switched the tarball creation
> process to be a VPATH build, then there'd be path dependencies in the
> created tarballs.  That would be bad.
>
> More generally, my concern here is not just that we fix this problem
> but that it stays fixed.  If some individual scripts print $0 into
> their output and it happens to not affect any built distribution files
> today, it's still bad, because tomorrow somebody might copy that coding
> pattern into someplace else where it matters more.  I think we need a
> project policy that thou shalt not print $0 into generated files, period.
>
> Also, experimenting with a VPATH build, I verified that such "helpful"
> practices as printing $infile or @ARGV into the output file will also
> create path dependencies.  So I think we need to lose those too.
> It's not like they're adding any info you can't find out from the
> Makefiles.
>
> On the other hand, there is something we can do that will improve
> matters: rather than just printing the base name of the script,
> let's print its full relative path within the PG sources, eg instead
> of Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl.
> My thought here is that if you're not already intimately familiar
> with a script you might not remember where it lives, the more so
> if you're looking at a file that's been put into an installation
> tree far away from where it was generated.  I see that this policy
> was already followed in some places, just not in the ones that were
> using the $0 shortcut.
>
> In short, I propose the attached more-extensive patch.
>
> Some of the files generated by these scripts, particularly the map
> files generated by the src/backend/utils/mb/Unicode/ scripts, are
> not just present in tarballs but are actually in our git repo.
> So changing those scripts won't affect anything until/unless someone
> updates the repo's generated files, which I've not done here and
> don't feel much need to do.  I just want to establish a principle
> that we don't print path-dependent info into generated files.
>
>         


Looks reasonable. Regarding the change to TestLib.pm, we should make
sure that the tests have unique names. There is a small amount of
duplication currently:

    ./src/bin/pg_dump/t/001_basic.pl
    ./src/bin/pg_rewind/t/001_basic.pl
    ./src/test/modules/commit_ts/t/001_base.pl
    ./src/test/modules/test_pg_dump/t/001_base.pl


cheers

andrew

-- 

Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Michael Paquier
Date:
On Thu, Dec 21, 2017 at 10:13 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> Looks reasonable. Regarding the change to TestLib.pm, we should make
> sure that the tests have unique names. There is a small amount of
> duplication currently:
>
>     ./src/bin/pg_dump/t/001_basic.pl
>     ./src/bin/pg_rewind/t/001_basic.pl
>     ./src/test/modules/commit_ts/t/001_base.pl
>     ./src/test/modules/test_pg_dump/t/001_base.pl

Why does the uniqueness of application_name set in the TAP test suite
matter? Each instance of Postgres is self-contained into its own Unix
domain path and port.
-- 
Michael


Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Andrew Dunstan
Date:

On 12/21/2017 08:27 AM, Michael Paquier wrote:
> On Thu, Dec 21, 2017 at 10:13 PM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> Looks reasonable. Regarding the change to TestLib.pm, we should make
>> sure that the tests have unique names. There is a small amount of
>> duplication currently:
>>
>>     ./src/bin/pg_dump/t/001_basic.pl
>>     ./src/bin/pg_rewind/t/001_basic.pl
>>     ./src/test/modules/commit_ts/t/001_base.pl
>>     ./src/test/modules/test_pg_dump/t/001_base.pl
> Why does the uniqueness of application_name set in the TAP test suite
> matter? Each instance of Postgres is self-contained into its own Unix
> domain path and port.



I've found it annoying in the past. Probably now that the buildfarm
reports each test series separately it matters less. Previously we
reported many of the TAP tests together.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Christoph Berg
Date:
Re: Andrew Dunstan 2017-12-21 <d68d254a-6f97-a8c2-4656-ab8886fe2c38@2ndQuadrant.com>
> >>     ./src/bin/pg_dump/t/001_basic.pl
> >>     ./src/bin/pg_rewind/t/001_basic.pl
> >>     ./src/test/modules/commit_ts/t/001_base.pl
> >>     ./src/test/modules/test_pg_dump/t/001_base.pl
> > Why does the uniqueness of application_name set in the TAP test suite
> > matter? Each instance of Postgres is self-contained into its own Unix
> > domain path and port.
> 
> I've found it annoying in the past. Probably now that the buildfarm
> reports each test series separately it matters less. Previously we
> reported many of the TAP tests together.

Rename them to the directory name:

    ./src/bin/pg_dump/t/001_pg_dump.pl
    ./src/bin/pg_rewind/t/001_pg_rewind.pl
    ./src/test/modules/commit_ts/t/001_commit_ts.pl
    ./src/test/modules/test_pg_dump/t/001_test_pg_dump.pl

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE


Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Alvaro Herrera
Date:
Christoph Berg wrote:
> Re: Andrew Dunstan 2017-12-21 <d68d254a-6f97-a8c2-4656-ab8886fe2c38@2ndQuadrant.com>

> > I've found it annoying in the past. Probably now that the buildfarm
> > reports each test series separately it matters less. Previously we
> > reported many of the TAP tests together.
> 
> Rename them to the directory name:
> 
>     ./src/bin/pg_rewind/t/001_pg_rewind.pl
>     ./src/test/modules/commit_ts/t/001_commit_ts.pl
>     ./src/test/modules/test_pg_dump/t/001_test_pg_dump.pl

These sound good to me.

>     ./src/bin/pg_dump/t/001_pg_dump.pl

There already is t/002_pg_dump.pl.  src/bin/pg_dump/t/001_basic.pl is
all about command lines, perhaps we can find a name that evokes that --
001_dump_cmdline.pl?  (prefixed "_dump_" to be in line with
t/010_dump_connstr.pl)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Tom Lane
Date:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> Looks reasonable. Regarding the change to TestLib.pm, we should make
> sure that the tests have unique names. There is a small amount of
> duplication currently:

>     ./src/bin/pg_dump/t/001_basic.pl
>     ./src/bin/pg_rewind/t/001_basic.pl
>     ./src/test/modules/commit_ts/t/001_base.pl
>     ./src/test/modules/test_pg_dump/t/001_base.pl

Perhaps, but that's a separate/pre-existing problem no?  The point
of the change in my patch was just to avoid getting different
behavior in a VPATH build.

            regards, tom lane


Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Peter Eisentraut
Date:
On 12/21/17 08:13, Andrew Dunstan wrote:
> Looks reasonable. Regarding the change to TestLib.pm, we should make
> sure that the tests have unique names. There is a small amount of
> duplication currently:
> 
>     ./src/bin/pg_dump/t/001_basic.pl
>     ./src/bin/pg_rewind/t/001_basic.pl
>     ./src/test/modules/commit_ts/t/001_base.pl
>     ./src/test/modules/test_pg_dump/t/001_base.pl

But that's one of the reasons one has directories, so you don't need to
have globally unique names.  I don't actually see why the change in
TestLib.pm is necessary or how it relates to this thread.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Peter Eisentraut
Date:
On 12/20/17 21:50, Tom Lane wrote:
> -my $this_script = $0;
> +my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_BIG5.pl';

This kind of things looks awful.  Why is this better?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
David Fetter
Date:
On Sun, Dec 24, 2017 at 12:26:53PM -0500, Peter Eisentraut wrote:
> On 12/20/17 21:50, Tom Lane wrote:
> > -my $this_script = $0;
> > +my $this_script = 'src/backend/utils/mb/Unicode/UCS_to_BIG5.pl';
> 
> This kind of things looks awful.  Why is this better?

What's wrong with using the path?  As you point out elsewhere on this
thread, it's a way to identify the program uniquely and reproducibly.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Michael Paquier
Date:
On Mon, Dec 25, 2017 at 2:25 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> On 12/21/17 08:13, Andrew Dunstan wrote:
>> Looks reasonable. Regarding the change to TestLib.pm, we should make
>> sure that the tests have unique names. There is a small amount of
>> duplication currently:
>>
>>     ./src/bin/pg_dump/t/001_basic.pl
>>     ./src/bin/pg_rewind/t/001_basic.pl
>>     ./src/test/modules/commit_ts/t/001_base.pl
>>     ./src/test/modules/test_pg_dump/t/001_base.pl
>
> But that's one of the reasons one has directories, so you don't need to
> have globally unique names.  I don't actually see why the change in
> TestLib.pm is necessary or how it relates to this thread.

+1 to both things here. I have a hard time undestanding why changing the
output of TAP test logs has anything to do with this thread, as well as
why having unique file names is mandatory. If this really matters, I would
suggest spawning a different thread.
-- 
Michael


Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 12/21/17 08:13, Andrew Dunstan wrote:
>> Looks reasonable. Regarding the change to TestLib.pm, we should make
>> sure that the tests have unique names. There is a small amount of
>> duplication currently:
>> 
>> ./src/bin/pg_dump/t/001_basic.pl
>> ./src/bin/pg_rewind/t/001_basic.pl
>> ./src/test/modules/commit_ts/t/001_base.pl
>> ./src/test/modules/test_pg_dump/t/001_base.pl

> But that's one of the reasons one has directories, so you don't need to
> have globally unique names.

I agree that it's not obvious that we need to rename these scripts.
In the current setup for running regression tests, the associated
postmaster logs would go into different subdirectories, so the conflict
in "application" names appearing in log entries doesn't seem like it
would lead to real confusion.

> I don't actually see why the change in
> TestLib.pm is necessary or how it relates to this thread.

Basically, as the code stood:

    $ENV{PGAPPNAME} = $0;

running any TAP test in a VPATH build would result in PGAPPNAME becoming a
full path, resulting in different log entries than you get in a non-VPATH
build.  That seemed undesirable to me, so I added basename() to eliminate
the difference in behavior.

There's some room to argue that given that we have duplicate script names,
a (partial) path would be a good thing.  But if you believe that, then we
should attempt to make that happen the same way with or without VPATH.

            regards, tom lane