Re: make dist using git archive - Mailing list pgsql-hackers

From Eli Schwartz
Subject Re: make dist using git archive
Date
Msg-id 76516f81-31b0-40db-b30e-2fe9e332895a@gmail.com
Whole thread Raw
In response to Re: make dist using git archive  (Peter Eisentraut <peter@eisentraut.org>)
Responses Re: make dist using git archive
List pgsql-hackers
Hello, meson developer here.


On 1/23/24 4:30 AM, Peter Eisentraut wrote:
> On 22.01.24 21:04, Tristan Partin wrote:
>> I am not really following why we can't use the builtin Meson dist
>> command. The only difference from my testing is it doesn't use a
>> --prefix argument.
>
> Here are some problems I have identified:
>
> 1. meson dist internally runs gzip without the -n option.  That makes
> the tar.gz archive include a timestamp, which in turn makes it not
> reproducible.


Well, it uses python tarfile which uses python gzip support under the
hood, but yes, that is true, python tarfile doesn't expose this tunable.


> 2. Because gzip includes a platform indicator in the archive, the
> produced tar.gz archive is not reproducible across platforms.  (I don't
> know if gzip has an option to avoid that.  git archive uses an internal
> gzip implementation that handles this.)


This appears to be https://github.com/python/cpython/issues/112346


> 3. Meson does not support tar.bz2 archives.


Simple enough to add, but I'm a bit surprised as usually people seem to
want either gzip for portability or xz for efficient compression.


> 4. Meson uses git archive internally, but then unpacks and repacks the
> archive, which loses the ability to use git get-tar-commit-id.


What do you use this for? IMO a more robust way to track the commit used
is to use gitattributes export-subst to write a `.git_archival.txt` file
containing the commit sha1 and other info -- this can be read even after
the file is extracted, which means it can also be used to bake the ID
into the built binaries e.g. as part of --version output.


> 5. I have found that the tar archives created by meson and git archive
> include the files in different orders.  I suspect that the Python
> tarfile module introduces some either randomness or platform dependency.


Different orders is meaningless, the question is whether the order is
internally consistent. Python uses sorted() to guarantee a stable order,
which may be a different algorithm than the one git-archive uses to
guarantee a stable order. But the order should be stable and that is
what matters.


> 6. meson dist is also slower because of the additional work.


I'm amenable to skipping the extraction/recombination of subprojects and
running of dist scripts in the event that neither exist, as Tristan
offered to do, but...


> 7. meson dist produces .sha256sum files but we have called them .sha256.
>  (This is obviously trivial, but it is something that would need to be
> dealt with somehow nonetheless.)
>
> Most or all of these issues are fixable, either upstream in Meson or by
> adjusting our own requirements.  But for now this route would have some
> significant disadvantages.


Overall I feel like much of this is about requiring dist tarballs to be
byte-identical to other dist tarballs, although reproducible builds is
mainly about artifacts, not sources, and for sources it doesn't
generally matter unless the sources are ephemeral and generated
on-demand (in which case it is indeed very important to produce the same
tarball each time). A tarball is usually generated once, signed, and
uploaded to release hosting. Meson already guarantees the contents are
strictly based on the built tag.


--
Eli Schwartz

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: A performance issue with Memoize
Next
From: "David E. Wheeler"
Date:
Subject: Bug: The "directory" control parameter does not work