Thread: Git revision in tarballs
I think it'd be useful to be able to identify exactly which git commit was used to produce a tarball. This would be especially useful when downloading snapshot tarballs where that's not entirely clear, but can also be used to verify that the release tarballs matches what's expected (in the extremely rare case that a tarball is rewrapped for example). What do people think of the attached? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Attachment
čt 15. 7. 2021 v 10:33 odesílatel Magnus Hagander <magnus@hagander.net> napsal: > > I think it'd be useful to be able to identify exactly which git commit > was used to produce a tarball. This would be especially useful when > downloading snapshot tarballs where that's not entirely clear, but can > also be used to verify that the release tarballs matches what's > expected (in the extremely rare case that a tarball is rewrapped for > example). > > What do people think of the attached? The only problem I do see is adding "git" as a new dependency. That can potentially cause troubles. For the file name, I have seen GIT_VERSION or REVISION file names used before in another projects. Using ".gitrevision" doesn't make sense to me since it will be hidden on Unix by default and I'm not sure that is intended. > -- > Magnus Hagander > Me: https://www.hagander.net/ > Work: https://www.redpill-linpro.com/
On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek <josef.simanek@gmail.com> wrote: > > čt 15. 7. 2021 v 10:33 odesílatel Magnus Hagander <magnus@hagander.net> napsal: > > > > I think it'd be useful to be able to identify exactly which git commit > > was used to produce a tarball. This would be especially useful when > > downloading snapshot tarballs where that's not entirely clear, but can > > also be used to verify that the release tarballs matches what's > > expected (in the extremely rare case that a tarball is rewrapped for > > example). > > > > What do people think of the attached? > > The only problem I do see is adding "git" as a new dependency. That > can potentially cause troubles. But only for *creating* the tarballs, and not for using them. I'm not sure what the usecase would be to create a tarball from an environment that doesn't have git? > For the file name, I have seen GIT_VERSION or REVISION file names used > before in another projects. Using ".gitrevision" doesn't make sense to > me since it will be hidden on Unix by default and I'm not sure that is > intended. It was definitely intended, as I'd assume it's normally a file that most people don't care about, but more something that scripts that verify things would. But I'm more than happy to change it to a different name if that's preferred. I looked around a bit and couldn't find any general consensus for a name for such a file, but I may not have looked carefully enough. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Thu, Jul 15, 2021 at 01:44:45PM +0200, Magnus Hagander wrote: > But only for *creating* the tarballs, and not for using them. I'm not > sure what the usecase would be to create a tarball from an environment > that doesn't have git? Which would likely mean somebody creating a release tarball in an environment doing a build with what is already a release tarball. Adding a dependency to git in this code path does not sound that bad to me, FWIW. -- Michael
Attachment
Magnus Hagander <magnus@hagander.net> writes: > On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek <josef.simanek@gmail.com> wrote: >> The only problem I do see is adding "git" as a new dependency. That >> can potentially cause troubles. > But only for *creating* the tarballs, and not for using them. I'm not > sure what the usecase would be to create a tarball from an environment > that doesn't have git? I agree, this objection seems silly. If we ever move off of git, the process could be adapted at that time. However, there *is* a reasonable question whether this ought to be handled by "make dist" versus the tarball-wrapping script. >> For the file name, I have seen GIT_VERSION or REVISION file names used >> before in another projects. Using ".gitrevision" doesn't make sense to >> me since it will be hidden on Unix by default and I'm not sure that is >> intended. > It was definitely intended, as I'd assume it's normally a file that > most people don't care about, but more something that scripts that > verify things would. But I'm more than happy to change it to a > different name if that's preferred. I looked around a bit and couldn't > find any general consensus for a name for such a file, but I may not > have looked carefully enough. We already have that convention in place: $ ls -a ./ .gitignore README.git contrib/ ../ COPYRIGHT aclocal.m4 doc/ .dir-locals.el GNUmakefile config/ src/ .editorconfig GNUmakefile.in config.log tmp_install/ .git/ HISTORY config.status* .git-blame-ignore-revs Makefile configure* .gitattributes README configure.ac So ".gitrevision" or the like seems fine to me. My thoughts about the proposed patch are (1) you'd better have a .gitignore entry too, and (2) what is the mechanism that removes this file? It seems weird to have a make rule that makes a generated file but none to remove it. Perhaps maintainer-clean should remove it? Both of those issues vanish if this is delegated to the tarball making script; as does the need to cope with a starting point that isn't a specific commit. So on the whole I'm leaning to the idea that it would be better done over there. regards, tom lane
On Thu, Jul 15, 2021 at 3:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Magnus Hagander <magnus@hagander.net> writes: > > On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek <josef.simanek@gmail.com> wrote: > >> The only problem I do see is adding "git" as a new dependency. That > >> can potentially cause troubles. > > > But only for *creating* the tarballs, and not for using them. I'm not > > sure what the usecase would be to create a tarball from an environment > > that doesn't have git? > > I agree, this objection seems silly. If we ever move off of git, the > process could be adapted at that time. However, there *is* a reasonable > question whether this ought to be handled by "make dist" versus the > tarball-wrapping script. > > >> For the file name, I have seen GIT_VERSION or REVISION file names used > >> before in another projects. Using ".gitrevision" doesn't make sense to > >> me since it will be hidden on Unix by default and I'm not sure that is > >> intended. > > > It was definitely intended, as I'd assume it's normally a file that > > most people don't care about, but more something that scripts that > > verify things would. But I'm more than happy to change it to a > > different name if that's preferred. I looked around a bit and couldn't > > find any general consensus for a name for such a file, but I may not > > have looked carefully enough. > > We already have that convention in place: > > $ ls -a > ./ .gitignore README.git contrib/ > ../ COPYRIGHT aclocal.m4 doc/ > .dir-locals.el GNUmakefile config/ src/ > .editorconfig GNUmakefile.in config.log tmp_install/ > .git/ HISTORY config.status* > .git-blame-ignore-revs Makefile configure* > .gitattributes README configure.ac > > So ".gitrevision" or the like seems fine to me. > > My thoughts about the proposed patch are (1) you'd better have a > .gitignore entry too, and (2) what is the mechanism that removes > this file? It seems weird to have a make rule that makes a > generated file but none to remove it. Perhaps maintainer-clean > should remove it? maintainer-clean sounds reasonable for that, yes. > Both of those issues vanish if this is delegated to the tarball > making script; as does the need to cope with a starting point > that isn't a specific commit. So on the whole I'm leaning to > the idea that it would be better done over there. I'd be fine with either. The argument for putting it in the makefile would be, uh, maybe it makes it a tad bit easier to verify builds because you get it in your local build as well. But it's not like it's very *hard* to do it... Putting it in the tarball making script certainly works for me, though, if that's what people prefer. And that does away with the "clean" part as that one blows away the whole directory between each run. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > Putting it in the tarball making script certainly works for me, > though, if that's what people prefer. And that does away with the > "clean" part as that one blows away the whole directory between each > run. Actually, we *have* to do it over there, because what that script does is # Export the selected git ref git archive ${i} | tar xf - -C pgsql cd pgsql ./configure # some irrelevant stuff make dist So there's no .git subdirectory in the directory it runs "make dist" in. Now maybe it'd work anyway because of the GIT_DIR environment variable, but what I think is more likely is that the file would end up containing the current master-branch HEAD commit, whereas the thing we actually want to record here is ${i}. regards, tom lane
On Thu, Jul 15, 2021 at 4:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Magnus Hagander <magnus@hagander.net> writes: > > Putting it in the tarball making script certainly works for me, > > though, if that's what people prefer. And that does away with the > > "clean" part as that one blows away the whole directory between each > > run. > > Actually, we *have* to do it over there, because what that script > does is > > # Export the selected git ref > git archive ${i} | tar xf - -C pgsql > > cd pgsql > ./configure > # some irrelevant stuff > make dist > > So there's no .git subdirectory in the directory it runs "make dist" > in. Now maybe it'd work anyway because of the GIT_DIR environment > variable, but what I think is more likely is that the file would > end up containing the current master-branch HEAD commit, whereas > the thing we actually want to record here is ${i}. Hah, yeah, that certainly decides it. And I was even poking around that script as well today, just nt at the same time :) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On 15.07.21 10:33, Magnus Hagander wrote: > I think it'd be useful to be able to identify exactly which git commit > was used to produce a tarball. This would be especially useful when > downloading snapshot tarballs where that's not entirely clear, but can > also be used to verify that the release tarballs matches what's > expected (in the extremely rare case that a tarball is rewrapped for > example). Or we could do what git-archive does: Additionally the commit ID is stored in a global extended pax header if the tar format is used; it can be extracted using git get-tar-commit-id. In ZIP files it is stored as a file comment.
> On 21 Jul 2021, at 20:25, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 15.07.21 10:33, Magnus Hagander wrote: >> I think it'd be useful to be able to identify exactly which git commit >> was used to produce a tarball. This would be especially useful when >> downloading snapshot tarballs where that's not entirely clear, but can >> also be used to verify that the release tarballs matches what's >> expected (in the extremely rare case that a tarball is rewrapped for >> example). > > Or we could do what git-archive does: > > Additionally the commit ID is stored in a global extended > pax header if the tar format is used; it can be extracted using git > get-tar-commit-id. In ZIP files it is stored as > a file comment. That does adds Git as a dependency for consuming the tarball though, which might not be a problem but it's a change from what we require today. -- Daniel Gustafsson https://vmware.com/
Daniel Gustafsson <daniel@yesql.se> writes: > On 21 Jul 2021, at 20:25, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: >> On 15.07.21 10:33, Magnus Hagander wrote: >>> I think it'd be useful to be able to identify exactly which git commit >>> was used to produce a tarball. >> Or we could do what git-archive does: >> Additionally the commit ID is stored in a global extended >> pax header if the tar format is used; it can be extracted using git >> get-tar-commit-id. In ZIP files it is stored as >> a file comment. > That does adds Git as a dependency for consuming the tarball though, which > might not be a problem but it's a change from what we require today. It also requires keeping the tarball itself around, which you might not have done, or you might not remember which directory you extracted which tarball into. So on the whole that solution seems strictly worse. FYI, the "put it into .gitrevision" solution is already implemented in the new tarball-building script that Magnus and I have been working on off-list. regards, tom lane
> On 21 Jul 2021, at 21:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > FYI, the "put it into .gitrevision" solution is already implemented > in the new tarball-building script that Magnus and I have been > working on off-list. +1, I think that's the preferred option. -- Daniel Gustafsson https://vmware.com/
On 21.07.21 21:12, Daniel Gustafsson wrote: >> Or we could do what git-archive does: >> >> Additionally the commit ID is stored in a global extended >> pax header if the tar format is used; it can be extracted using git >> get-tar-commit-id. In ZIP files it is stored as >> a file comment. > > That does adds Git as a dependency for consuming the tarball though, which > might not be a problem but it's a change from what we require today. How so?
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > On 21.07.21 21:12, Daniel Gustafsson wrote: >>> Additionally the commit ID is stored in a global extended >>> pax header if the tar format is used; it can be extracted using git >>> get-tar-commit-id. In ZIP files it is stored as >>> a file comment. >> That does adds Git as a dependency for consuming the tarball though, which >> might not be a problem but it's a change from what we require today. > How so? It's only a dependency if you want to know the commit ID, which perhaps isn't something you would have use for if you don't have git installed ... but I don't think that's totally obvious. Personally I'd be more worried about rendering the tarballs totally corrupt from the perspective of somebody using an old "tar" that hasn't heard of extended pax headers. Maybe there are no such versions in the wild anymore; but I do not see any advantages to this approach that would justify taking any risk for. regards, tom lane