Re: make dist using git archive - Mailing list pgsql-hackers

From Eli Schwartz
Subject Re: make dist using git archive
Date
Msg-id 39a8ff30-a046-4dd2-847a-5c33e9d24031@gmail.com
Whole thread Raw
In response to Re: make dist using git archive  (Peter Eisentraut <peter@eisentraut.org>)
Responses Re: make dist using git archive
List pgsql-hackers
On 1/31/24 3:03 AM, Peter Eisentraut wrote:
>> What do you use this for? IMO a more robust way to track the commit used
>> is to use gitattributes export-subst to write a `.git_archival.txt` file
>> containing the commit sha1 and other info -- this can be read even after
>> the file is extracted, which means it can also be used to bake the ID
>> into the built binaries e.g. as part of --version output.
>
> It's a marginal use case, for sure.  But it is something that git
> provides tooling for that is universally available.  Any alternative
> would be an ad-hoc solution that is specific to our project and would be
> different for the next project.


mercurial has the "archivemeta" config setting that exports similar
information, but forces the filename ".hg_archival.txt".

The setuptools-scm project follows this pattern by requiring the git
file to be called ".git_archival.txt" with a set pattern mimicking the
hg one:

https://setuptools-scm.readthedocs.io/en/latest/usage/#git-archives


So, I guess you could use this and then it would not be specific to your
project. :)


>> Overall I feel like much of this is about requiring dist tarballs to be
>> byte-identical to other dist tarballs, although reproducible builds is
>> mainly about artifacts, not sources, and for sources it doesn't
>> generally matter unless the sources are ephemeral and generated
>> on-demand (in which case it is indeed very important to produce the same
>> tarball each time).
>
> The source tarball is, in a way, also an artifact.
>
> I think it's useful that others can easily independently verify that the
> produced tarball matches what they have locally.  It's not an absolute
> requirement, but given that it is possible, it seems useful to take
> advantage of it.
>
> In a way, this also avoids the need for signing the tarball, which we
> don't do.  So maybe that contributes to a different perspective.


Since you mention signing and not as a simple "aside"...

That's a fascinating perspective. I wonder how people independently
verify that what they have locally (I assume from git clones) matches
what the postgres committers have authorized.

I'm a bit skeptical that you can avoid the need to perform code-signing
at some stage, somewhere, somehow, by suggesting that people can simply
git clone, run some commands and compare the tarball. The point of
signing is to verify that no one has acquired an untraceable API token
they should not have and gotten write access to the authoritative server
then uploaded malicious code under various forged identities, possibly
overwriting previous versions, either in git or out of git.

Ideally git commits should be signed, but that requires large numbers of
people to have security-minded git commit habits. From a quick check of
the postgres commit logs, only one person seems to be regularly signing
commits, which does provide a certain measure of protection -- an
attacker cannot attack via `git push --force` across that boundary, and
those commits serve as verifiable states that multiple people have seen.

The tags aren't signed either, which is a big issue for verifiably
identifying the release artifacts published by the release manager. Even
if not every commit is signed, having signed tags provides a known
coordination point of code that has been broadly tested and code-signed
for mass use.

...

In summary, my opinion is that using git-get-tar-commit-id provides zero
security guarantees, and if that's not something you are worried about
then that's one thing, but if you were expecting it to *replace* signing
the tarball, then that's.... very much another thing entirely, and not
one I can agree at all with.



--
Eli Schwartz

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: Robert Haas
Date:
Subject: Re: Possibility to disable `ALTER SYSTEM`