Re: make dist using git archive - Mailing list pgsql-hackers
From | Eli Schwartz |
---|---|
Subject | Re: make dist using git archive |
Date | |
Msg-id | 76516f81-31b0-40db-b30e-2fe9e332895a@gmail.com Whole thread Raw |
In response to | Re: make dist using git archive (Peter Eisentraut <peter@eisentraut.org>) |
Responses |
Re: make dist using git archive
|
List | pgsql-hackers |
Hello, meson developer here. On 1/23/24 4:30 AM, Peter Eisentraut wrote: > On 22.01.24 21:04, Tristan Partin wrote: >> I am not really following why we can't use the builtin Meson dist >> command. The only difference from my testing is it doesn't use a >> --prefix argument. > > Here are some problems I have identified: > > 1. meson dist internally runs gzip without the -n option. That makes > the tar.gz archive include a timestamp, which in turn makes it not > reproducible. Well, it uses python tarfile which uses python gzip support under the hood, but yes, that is true, python tarfile doesn't expose this tunable. > 2. Because gzip includes a platform indicator in the archive, the > produced tar.gz archive is not reproducible across platforms. (I don't > know if gzip has an option to avoid that. git archive uses an internal > gzip implementation that handles this.) This appears to be https://github.com/python/cpython/issues/112346 > 3. Meson does not support tar.bz2 archives. Simple enough to add, but I'm a bit surprised as usually people seem to want either gzip for portability or xz for efficient compression. > 4. Meson uses git archive internally, but then unpacks and repacks the > archive, which loses the ability to use git get-tar-commit-id. What do you use this for? IMO a more robust way to track the commit used is to use gitattributes export-subst to write a `.git_archival.txt` file containing the commit sha1 and other info -- this can be read even after the file is extracted, which means it can also be used to bake the ID into the built binaries e.g. as part of --version output. > 5. I have found that the tar archives created by meson and git archive > include the files in different orders. I suspect that the Python > tarfile module introduces some either randomness or platform dependency. Different orders is meaningless, the question is whether the order is internally consistent. Python uses sorted() to guarantee a stable order, which may be a different algorithm than the one git-archive uses to guarantee a stable order. But the order should be stable and that is what matters. > 6. meson dist is also slower because of the additional work. I'm amenable to skipping the extraction/recombination of subprojects and running of dist scripts in the event that neither exist, as Tristan offered to do, but... > 7. meson dist produces .sha256sum files but we have called them .sha256. > (This is obviously trivial, but it is something that would need to be > dealt with somehow nonetheless.) > > Most or all of these issues are fixable, either upstream in Meson or by > adjusting our own requirements. But for now this route would have some > significant disadvantages. Overall I feel like much of this is about requiring dist tarballs to be byte-identical to other dist tarballs, although reproducible builds is mainly about artifacts, not sources, and for sources it doesn't generally matter unless the sources are ephemeral and generated on-demand (in which case it is indeed very important to produce the same tarball each time). A tarball is usually generated once, signed, and uploaded to release hosting. Meson already guarantees the contents are strictly based on the built tag. -- Eli Schwartz
Attachment
pgsql-hackers by date: