[RFC] building postgres with meson - Mailing list pgsql-hackers
Hi, For the last year or so I've on and off tinkered with $subject. I think it's in a state worth sharing now. First, let's look at a little comparison. My workstation: non-cached configure: current: 11.80s meson: 6.67s non-cached build (world-bin): current: 40.46s ninja: 7.31s no-change build: current: 1.17s ninja: 0.06s test world: current: 105s meson: 63s What actually started to motivate me however were the long times windows builds took to come back with testsresults. On CI, with the same machine config: build: current: 202s (doesn't include genbki etc) meson+ninja: 140s meson+msbuild: 206s test: current: 1323s (many commands) meson: 903s (single command) (note that the test comparison isn't quite fair - there's a few tests missing, but it's just small contrib ones afaik) The biggest difference to me however is not the speed, but how readable the output is. Running the tests with meson in a terminal, shows the number of tests that completed out of how many total, how much time has passed, how long the currently running tests already have been running. At the end of a testrun a count of tests is shown: 188/189 postgresql:tap+pg_basebackup / pg_basebackup/t/010_pg_basebackup.pl OK 39.51s 110 subtests passed 189/189 postgresql:isolation+snapshot_too_old / snapshot_too_old/isolation OK 62.93s Ok: 188 Expected Fail: 0 Fail: 1 Unexpected Pass: 0 Skipped: 0 Timeout: 0 Full log written to /tmp/meson/meson-logs/testlog.txt The log has the output of the tests and ends with: Summary of Failures: 120/189 postgresql:tap+recovery / recovery/t/007_sync_rep.pl ERROR 7.16s (exitstatus 255 or signal 127 SIGinvalid) Quite the difference to make check-world -jnn output. So, now that the teasing is done, let me explain a bit what lead me down this path: Autoconf + make is not being actively developed. Especially autoconf is *barely* in maintenance mode - despite many shortcomings and bugs. It's also technology that very few want to use - autoconf m4 is scary, and it's scarier for people that started more recently than a lot of us committers for example. Recursive make as we use it is hard to get right. One reason the clean make build is so slow compared to meson is that we had to resort to .NOTPARALLEL to handle dependencies in a bunch of places. And despite that, I quite regularly see incremental build failures that can be resolved by retrying the build. While we have incremental build via --enable-depend, they don't work that reliable (i.e. misses necessary rebuilds) and yet is often too aggressive. More modern build system can keep track of the precise command used to build a target and rebuild it when that command changes. We also don't just have the autoconf / make buildsystem, there's also the msvc project generator - something most of us unix-y folks do not like to touch. I think that, combined with there being no easy way to run all tests, and it being just different, really hurt our windows developer appeal (and subsequently the quality of postgres on windows). I'm not saying this to ding the project generator - that was well before there were decent "meta" buildsystems out there (and in some ways it is a small one itself). The last big issue I have with the current situation is that there's no good test integration. make check-world output is essentially unreadable / not automatically parseable. Which led to the buildfarm having a separate list of things it needs to test, so that failures can be pinpointed and paired with appropriate logs. That approach unfortunately doesn't scale well to multi-core CPUs, slowing down the buildfarm by a fair bit. This all led to me to experiment with improvements. I tried a few somewhat crazy but incremental things like converting our buildsystem to non-recursive make (I got it to build the backend, but it's too hard to do manually I think), or to not run tests during the recursive make check-world, but to append commands to a list of tests, that then is run by a helper (can kinda be made to work). In the end I concluded that the amount of time we'd need to invest to maintain our more-and-more custom buildsystem going forward doesn't make sense. Which lead me to look around and analyze which other buildsystems there are that could make some sense for us. The halfway decent list includes, I think: 1) cmake 2) bazel 3) meson cmake would be a decent choice, I think. However, I just can't fully warm up to it. Something about it just doesn't quite sit right with me. That's not a good enough reason to prevent others from suggesting to use it, but it's good enough to justify not investing a lot of time in it myself. Bazel has some nice architectural properties. But it requires a JVM to run - I think that basically makes it insuitable for us. And the build information seems quite arduous to maintain too. Which left me with meson. It is a meta-buildsystem that can do the actual work of building via ninja (the most common one, also targeted by cmake), msbuild (visual studio project files, important for GUI work) and xcode projects (I assume that's for a macos IDE, but I haven't tried to use it). Meson roughly does what autoconf+automake did, in a python-esque DSL, and outputs build-instructions for ninja / msbuild / xcode. One interesting bit is that meson itself is written in python ( and fairly easy to contribute too - I got a few changes in now). I don't think meson is perfect architecturally - e.g. its insistence on not having functions ends up making it a bit harder to not end up duplicating code. There's some user-interface oddities that are now hard to fix fully, due to the faily wide usage. But all-in-all it's pretty nice to use. Its worth calling out that a lot of large open source projects have been / are migrating to meson. qemu/kvm, mesa (core part of graphics stack on linux and also widely used in other platforms), a good chunk of GNOME, and quite a few more. Due to that it seems unlikely to be abandoned soon. As far as I can tell the only OS that postgres currently supports that meson doesn't support is HPUX. It'd likely be fairly easy to add gcc-on-hpux support, a chunk more to add support for the proprietary ones. The attached patch (meson support is 0016, the rest is prerequisites that aren't that interesting at this stage) converts most of postgres to meson. There's a few missing contrib modules, only about half the optional library dependencies are implemented, and I've only built on x64. It builds on freebsd, linux, macos and windows (both ninja and msbuild) and cross builds from linux to windows. Thomas helped make the freebsd / macos pieces a reality, thanks! I took a number of shortcuts (although there used to be a *lot* more). So this shouldn't be reviewed to the normal standard of the community - it's a prototype. But I think it's in a complete enough shape that it allows to do a well-informed evaluation. What doesn't yet work/ build: - plenty optional libraries, contrib, NLS, docs build - PGXS - and I don't yet know what to best do about it. One backward-compatible way would be to continue use makefiles for pgxs, but do the necessary replacement of Makefile.global.in via meson (and not use that for postgres' own build). But that doesn't really provide a nicer path for building postgres extensions on windows, so it'd definitely not be a long-term path. - JIT bitcode generation for anything but src/backend. - anything but modern-ish x86. That's proably a small amount of work, but something that needs to be done. - exporting all symbols for extension modules on windows (the stuff for postgres is implemented). Instead I marked the relevant symbols als declspec(dllexport). I think we should do that regardless of the buildsystem change. Restricting symbol visibility via gcc's -fvisibility=hidden for extensions results in a substantially reduced number of exported symbols, and even reduces object size (and I think improves the code too). I'll send an email about that separately. There's a lot more stuff to talk about, but I'll stop with a small bit of instructions below: Demo / instructions: # Get code git remote add andres git@github.com:anarazel/postgres.git git fetch andres git checkout --track andres/meson # setup build directory meson setup build --buildtype debug cd build # build (uses automatically as many cores as available) ninja # change configuration, build again meson configure -Dssl=openssl ninja # run all tests meson test # run just recovery tests meson test --suite setup --suite recovery # list tests meson test --list Greetings, Andres Freund
Attachment
- v3-0001-ci-backend-windows-DONTMERGE-crash-reporting-back.patch
- v3-0002-ci-Add-CI-for-FreeBSD-Linux-MacOS-and-Windows-uti.patch
- v3-0003-fixup-ci-Add-CI-for-FreeBSD-Linux-MacOS-and-Windo.patch
- v3-0004-meson-prereq-output-and-depencency-tracking-work.patch
- v3-0005-meson-prereq-move-snowball_create.sql-creation-in.patch
- v3-0006-meson-prereq-add-output-path-arg-in-generate-lwlo.patch
- v3-0007-meson-prereq-add-src-tools-gen_versioning_script..patch
- v3-0008-meson-prereq-generate-errcodes.pl-accept-output-f.patch
- v3-0009-meson-prereq-remove-unhelpful-chattiness-in-snowb.patch
- v3-0010-meson-prereq-Can-we-get-away-with-not-export-all-.patch
- v3-0011-meson-prereq-Handle-DLSUFFIX-in-msvc-builds-simil.patch
- v3-0012-prereq-Move-sed-expression-from-regress-python3-m.patch
- v3-0013-Adapt-src-test-ldap-t-001_auth.pl-to-work-with-op.patch
- v3-0014-wip-don-t-run-ldap-tests-on-windows.patch
- v3-0015-wip-split-TESTDIR-into-two.patch
- v3-0016-meson-Add-draft-of-a-meson-based-buildsystem.patch
- v3-0017-ci-Build-both-with-meson-and-as-before.patch
pgsql-hackers by date: