Thread: Git revision in tarballs

Git revision in tarballs

From
Magnus Hagander
Date:
I think it'd be useful to be able to identify exactly which git commit
was used to produce a tarball. This would be especially useful when
downloading snapshot tarballs where that's not entirely clear, but can
also be used to verify that the release tarballs matches what's
expected (in the extremely rare case that a tarball is rewrapped for
example).

What do people think of the attached?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Attachment

Re: Git revision in tarballs

From
Josef Šimánek
Date:
čt 15. 7. 2021 v 10:33 odesílatel Magnus Hagander <magnus@hagander.net> napsal:
>
> I think it'd be useful to be able to identify exactly which git commit
> was used to produce a tarball. This would be especially useful when
> downloading snapshot tarballs where that's not entirely clear, but can
> also be used to verify that the release tarballs matches what's
> expected (in the extremely rare case that a tarball is rewrapped for
> example).
>
> What do people think of the attached?

The only problem I do see is adding "git" as a new dependency. That
can potentially cause troubles.

For the file name, I have seen GIT_VERSION or REVISION file names used
before in another projects. Using ".gitrevision" doesn't make sense to
me since it will be hidden on Unix by default and I'm not sure that is
intended.

> --
>  Magnus Hagander
>  Me: https://www.hagander.net/
>  Work: https://www.redpill-linpro.com/



Re: Git revision in tarballs

From
Magnus Hagander
Date:
On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
>
> čt 15. 7. 2021 v 10:33 odesílatel Magnus Hagander <magnus@hagander.net> napsal:
> >
> > I think it'd be useful to be able to identify exactly which git commit
> > was used to produce a tarball. This would be especially useful when
> > downloading snapshot tarballs where that's not entirely clear, but can
> > also be used to verify that the release tarballs matches what's
> > expected (in the extremely rare case that a tarball is rewrapped for
> > example).
> >
> > What do people think of the attached?
>
> The only problem I do see is adding "git" as a new dependency. That
> can potentially cause troubles.

But only for *creating* the tarballs,  and not for using them. I'm not
sure what the usecase would be to create a tarball from an environment
that doesn't have git?


> For the file name, I have seen GIT_VERSION or REVISION file names used
> before in another projects. Using ".gitrevision" doesn't make sense to
> me since it will be hidden on Unix by default and I'm not sure that is
> intended.

It was definitely intended, as I'd assume it's normally a file that
most people don't care about, but more something that scripts that
verify things would. But I'm more than happy to change it to a
different name if that's preferred. I looked around a bit and couldn't
find any general consensus for a name for such a file, but I may not
have looked carefully enough.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Git revision in tarballs

From
Michael Paquier
Date:
On Thu, Jul 15, 2021 at 01:44:45PM +0200, Magnus Hagander wrote:
> But only for *creating* the tarballs,  and not for using them. I'm not
> sure what the usecase would be to create a tarball from an environment
> that doesn't have git?

Which would likely mean somebody creating a release tarball in an
environment doing a build with what is already a release tarball.
Adding a dependency to git in this code path does not sound that bad
to me, FWIW.
--
Michael

Attachment

Re: Git revision in tarballs

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
>> The only problem I do see is adding "git" as a new dependency. That
>> can potentially cause troubles.

> But only for *creating* the tarballs,  and not for using them. I'm not
> sure what the usecase would be to create a tarball from an environment
> that doesn't have git?

I agree, this objection seems silly.  If we ever move off of git, the
process could be adapted at that time.  However, there *is* a reasonable
question whether this ought to be handled by "make dist" versus the
tarball-wrapping script.

>> For the file name, I have seen GIT_VERSION or REVISION file names used
>> before in another projects. Using ".gitrevision" doesn't make sense to
>> me since it will be hidden on Unix by default and I'm not sure that is
>> intended.

> It was definitely intended, as I'd assume it's normally a file that
> most people don't care about, but more something that scripts that
> verify things would. But I'm more than happy to change it to a
> different name if that's preferred. I looked around a bit and couldn't
> find any general consensus for a name for such a file, but I may not
> have looked carefully enough.

We already have that convention in place:

$ ls -a
./                      .gitignore      README.git      contrib/
../                     COPYRIGHT       aclocal.m4      doc/
.dir-locals.el          GNUmakefile     config/         src/
.editorconfig           GNUmakefile.in  config.log      tmp_install/
.git/                   HISTORY         config.status*
.git-blame-ignore-revs  Makefile        configure*
.gitattributes          README          configure.ac

So ".gitrevision" or the like seems fine to me.

My thoughts about the proposed patch are (1) you'd better have a
.gitignore entry too, and (2) what is the mechanism that removes
this file?  It seems weird to have a make rule that makes a
generated file but none to remove it.  Perhaps maintainer-clean
should remove it?

Both of those issues vanish if this is delegated to the tarball
making script; as does the need to cope with a starting point
that isn't a specific commit.  So on the whole I'm leaning to
the idea that it would be better done over there.

            regards, tom lane



Re: Git revision in tarballs

From
Magnus Hagander
Date:
On Thu, Jul 15, 2021 at 3:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Magnus Hagander <magnus@hagander.net> writes:
> > On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
> >> The only problem I do see is adding "git" as a new dependency. That
> >> can potentially cause troubles.
>
> > But only for *creating* the tarballs,  and not for using them. I'm not
> > sure what the usecase would be to create a tarball from an environment
> > that doesn't have git?
>
> I agree, this objection seems silly.  If we ever move off of git, the
> process could be adapted at that time.  However, there *is* a reasonable
> question whether this ought to be handled by "make dist" versus the
> tarball-wrapping script.
>
> >> For the file name, I have seen GIT_VERSION or REVISION file names used
> >> before in another projects. Using ".gitrevision" doesn't make sense to
> >> me since it will be hidden on Unix by default and I'm not sure that is
> >> intended.
>
> > It was definitely intended, as I'd assume it's normally a file that
> > most people don't care about, but more something that scripts that
> > verify things would. But I'm more than happy to change it to a
> > different name if that's preferred. I looked around a bit and couldn't
> > find any general consensus for a name for such a file, but I may not
> > have looked carefully enough.
>
> We already have that convention in place:
>
> $ ls -a
> ./                      .gitignore      README.git      contrib/
> ../                     COPYRIGHT       aclocal.m4      doc/
> .dir-locals.el          GNUmakefile     config/         src/
> .editorconfig           GNUmakefile.in  config.log      tmp_install/
> .git/                   HISTORY         config.status*
> .git-blame-ignore-revs  Makefile        configure*
> .gitattributes          README          configure.ac
>
> So ".gitrevision" or the like seems fine to me.
>
> My thoughts about the proposed patch are (1) you'd better have a
> .gitignore entry too, and (2) what is the mechanism that removes
> this file?  It seems weird to have a make rule that makes a
> generated file but none to remove it.  Perhaps maintainer-clean
> should remove it?

maintainer-clean sounds reasonable for that, yes.


> Both of those issues vanish if this is delegated to the tarball
> making script; as does the need to cope with a starting point
> that isn't a specific commit.  So on the whole I'm leaning to
> the idea that it would be better done over there.

I'd be fine with either. The argument for putting it in the makefile
would be, uh, maybe it makes it a tad bit easier to verify builds
because you get it in your local build as well. But it's not like it's
very *hard* to do it...

Putting it in the tarball making script certainly works for me,
though, if that's what people prefer. And that does away with the
"clean" part as that one blows away the whole directory between each
run.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Git revision in tarballs

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Putting it in the tarball making script certainly works for me,
> though, if that's what people prefer. And that does away with the
> "clean" part as that one blows away the whole directory between each
> run.

Actually, we *have* to do it over there, because what that script
does is

   # Export the selected git ref
   git archive ${i} | tar xf - -C pgsql

   cd pgsql
   ./configure
   # some irrelevant stuff
   make dist

So there's no .git subdirectory in the directory it runs "make dist"
in.  Now maybe it'd work anyway because of the GIT_DIR environment
variable, but what I think is more likely is that the file would
end up containing the current master-branch HEAD commit, whereas
the thing we actually want to record here is ${i}.

            regards, tom lane



Re: Git revision in tarballs

From
Magnus Hagander
Date:
On Thu, Jul 15, 2021 at 4:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Magnus Hagander <magnus@hagander.net> writes:
> > Putting it in the tarball making script certainly works for me,
> > though, if that's what people prefer. And that does away with the
> > "clean" part as that one blows away the whole directory between each
> > run.
>
> Actually, we *have* to do it over there, because what that script
> does is
>
>    # Export the selected git ref
>    git archive ${i} | tar xf - -C pgsql
>
>    cd pgsql
>    ./configure
>    # some irrelevant stuff
>    make dist
>
> So there's no .git subdirectory in the directory it runs "make dist"
> in.  Now maybe it'd work anyway because of the GIT_DIR environment
> variable, but what I think is more likely is that the file would
> end up containing the current master-branch HEAD commit, whereas
> the thing we actually want to record here is ${i}.

Hah, yeah, that certainly decides it. And I was even poking around
that script as well today, just nt at the same time :)


-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Git revision in tarballs

From
Peter Eisentraut
Date:
On 15.07.21 10:33, Magnus Hagander wrote:
> I think it'd be useful to be able to identify exactly which git commit
> was used to produce a tarball. This would be especially useful when
> downloading snapshot tarballs where that's not entirely clear, but can
> also be used to verify that the release tarballs matches what's
> expected (in the extremely rare case that a tarball is rewrapped for
> example).

Or we could do what git-archive does:

     Additionally the commit ID is stored in a global extended
     pax header if the tar format is used; it can be extracted using git
     get-tar-commit-id. In ZIP files it is stored as
     a file comment.



Re: Git revision in tarballs

From
Daniel Gustafsson
Date:
> On 21 Jul 2021, at 20:25, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> On 15.07.21 10:33, Magnus Hagander wrote:
>> I think it'd be useful to be able to identify exactly which git commit
>> was used to produce a tarball. This would be especially useful when
>> downloading snapshot tarballs where that's not entirely clear, but can
>> also be used to verify that the release tarballs matches what's
>> expected (in the extremely rare case that a tarball is rewrapped for
>> example).
>
> Or we could do what git-archive does:
>
>    Additionally the commit ID is stored in a global extended
>    pax header if the tar format is used; it can be extracted using git
>    get-tar-commit-id. In ZIP files it is stored as
>    a file comment.

That does adds Git as a dependency for consuming the tarball though, which
might not be a problem but it's a change from what we require today.

--
Daniel Gustafsson        https://vmware.com/




Re: Git revision in tarballs

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 21 Jul 2021, at 20:25, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>> On 15.07.21 10:33, Magnus Hagander wrote:
>>> I think it'd be useful to be able to identify exactly which git commit
>>> was used to produce a tarball.

>> Or we could do what git-archive does:
>> Additionally the commit ID is stored in a global extended
>> pax header if the tar format is used; it can be extracted using git
>> get-tar-commit-id. In ZIP files it is stored as
>> a file comment.

> That does adds Git as a dependency for consuming the tarball though, which
> might not be a problem but it's a change from what we require today.

It also requires keeping the tarball itself around, which you might not
have done, or you might not remember which directory you extracted which
tarball into.  So on the whole that solution seems strictly worse.

FYI, the "put it into .gitrevision" solution is already implemented
in the new tarball-building script that Magnus and I have been
working on off-list.

            regards, tom lane



Re: Git revision in tarballs

From
Daniel Gustafsson
Date:
> On 21 Jul 2021, at 21:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> FYI, the "put it into .gitrevision" solution is already implemented
> in the new tarball-building script that Magnus and I have been
> working on off-list.

+1, I think that's the preferred option.

--
Daniel Gustafsson        https://vmware.com/




Re: Git revision in tarballs

From
Peter Eisentraut
Date:
On 21.07.21 21:12, Daniel Gustafsson wrote:
>> Or we could do what git-archive does:
>>
>>     Additionally the commit ID is stored in a global extended
>>     pax header if the tar format is used; it can be extracted using git
>>     get-tar-commit-id. In ZIP files it is stored as
>>     a file comment.
> 
> That does adds Git as a dependency for consuming the tarball though, which
> might not be a problem but it's a change from what we require today.

How so?



Re: Git revision in tarballs

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 21.07.21 21:12, Daniel Gustafsson wrote:
>>> Additionally the commit ID is stored in a global extended
>>> pax header if the tar format is used; it can be extracted using git
>>> get-tar-commit-id. In ZIP files it is stored as
>>> a file comment.

>> That does adds Git as a dependency for consuming the tarball though, which
>> might not be a problem but it's a change from what we require today.

> How so?

It's only a dependency if you want to know the commit ID, which
perhaps isn't something you would have use for if you don't have
git installed ... but I don't think that's totally obvious.

Personally I'd be more worried about rendering the tarballs
totally corrupt from the perspective of somebody using an old
"tar" that hasn't heard of extended pax headers.  Maybe there
are no such versions in the wild anymore; but I do not see any
advantages to this approach that would justify taking any risk for.

            regards, tom lane