Thread: Re: CI and test improvements
On Sat, May 28, 2022 at 10:37:41AM -0500, Justin Pryzby wrote: > I'm "joining" a bunch of unresolved threads hoping to present them better since > they're related and I'm maintaining them together anyway. This resolves an error with libpq tests in an intermediate commit, probably caused by rebasing (and maybe hidden by the fact that the tests weren't being run). And updates ccache to avoid CCACHE_COMPILER. Should any of the test changes go into v15 ? > Subject: [PATCH 02/19] cirrus/vcregress: test modules/contrib with NO_INSTALLCHECK=1 > Subject: [PATCH 08/19] vcregress: add alltaptests > Subject: [PATCH 14/19] Move libpq_pipeline test into src/interfaces/libpq. > Subject: [PATCH 15/19] msvc: do not install libpq test tools by default I also added: cirrus/ccache: disable compression and show stats Since v4.0, ccache enables zstd compression by default, saving roughly 2x-3x. But, cirrus caches are compressed as tar.gz, so we could disable ccache compression, allowing cirrus to gzip the uncompressed data (better than ccache's default of zstd-1). https://cirrus-ci.com/build/5194596123672576 debian/bullseye has ccache 4.2; cirrus says 92MB cache after a single compilation; cache_size_kibibyte 102000 macos: has 4.5.1: 46MB cache; cache_size_kibibyte 51252 freebsd: has 3.7.12: 41MB cache; cache_size_kibibyte 130112 For some reason, mac's ccache uses 2x less space than either freesbsd or linux. Linux is ~30% larger. Freebsd uploads an artifact 3x smaller than the size ccache reports, because its ccache is old so doesn't enable compression by default. I've also sent some patches to Thomas for cfbot to help progress some of these patches (code coverage and documentation upload as artifacts). https://github.com/justinpryzby/cfbot/commits/master -- Justin
Attachment
- 0001-cirrus-windows-add-compiler_warnings_script.patch
- 0002-cirrus-vcregress-test-modules-contrib-with-NO_INSTAL.patch
- 0003-set-TESTDIR-from-src-test-perl-rather-than-Makefile-.patch
- 0004-s-also-remove-PATH.patch
- 0005-cirrus-macos-enable-various-runtime-checks.patch
- 0006-cirrus-linux-compile-with-fsanitize.patch
- 0007-cirrus-windows-increase-timeout-to-25min.patch
- 0008-vcregress-add-alltaptests.patch
- 0009-tmp-run-tap-tests-first.patch
- 0010-vcregress-run-alltaptests-in-parallel.patch
- 0011-cirrus-make-DebugInformationFormat-OldStyle-for-CI-b.patch
- 0012-cirrus-windows-ccache.patch
- 0013-cirrus-ccache-disable-compression-and-show-stats.patch
- 0014-another-way-to-install-uri-regress-uri_regress.patch
- 0015-Move-libpq_pipeline-test-into-src-interfaces-libpq.patch
- 0016-msvc-do-not-install-libpq-test-tools-by-default.patch
- 0017-cirrus-code-coverage.patch
- 0018-cirrus-build-docs-as-a-separate-task.patch
- 0019-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0020-f-html-index-file.patch
[Resending -- sorry if you receive this twice. Jacob's mail server has apparently offended the list management software so emails bounce if he's in CC.] On Fri, Jun 24, 2022 at 7:23 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > Freebsd uploads an artifact 3x smaller than the size ccache reports, because > its ccache is old so doesn't enable compression by default. That port/package got stuck on 3.x because of some dependency problems when using in bootstrapping FreeBSD itself (or other ports?), apparently. I didn't follow the details but the recent messages here sound hopeful and I'm keeping my eye on it to see if 4.x lands as a separate package we'd need to install, or something. Fingers crossed! https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234971 > I've also sent some patches to Thomas for cfbot to help progress some of these > patches (code coverage and documentation upload as artifacts). > https://github.com/justinpryzby/cfbot/commits/master Thanks, sorry for lack of action, will get to these soon.
Resending with a problematic address removed... On Sat, May 28, 2022 at 10:37:41AM -0500, Justin Pryzby wrote: > I'm anticipating the need to further re-arrange the patch set - it's not clear > which patches should go first. Maybe some patches should be dropped in favour > of the meson project. I guess some patches will have to be re-implemented with > meson (msvc warnings). > Maybe the improvements to vcregress should go into v15 ? CI should run all the > tests (which also serves to document *how* to run all the tests, even if there > isn't a literal check-world target). On Thu, Jun 23, 2022 at 02:31:25PM -0500, Justin Pryzby wrote: > Should any of the test changes go into v15 ? On Thu, Jul 07, 2022 at 07:22:32PM -0500, Justin Pryzby wrote: > Also, cirrus/freebsd task can run 3x faster with more CPUs. Checking if there's interest in any/none of these patches ? I have added several more. Do you have an idea when the meson branch might be merged ? Will vcregress remain for a while, or will it go away for v16 ? -- Justin
Attachment
- 0001-cirrus-windows-add-compiler_warnings_script.patch
- 0002-cirrus-vcregress-test-modules-contrib-with-NO_INSTAL.patch
- 0003-cirrus-ccache-disable-compression-and-show-stats.patch
- 0004-cirrus-ccache-add-explicit-cache-keys.patch
- 0005-silence-make-distprep-and-generated-headers.patch
- 0006-pg_regress-run-more-quietly.patch
- 0007-cirrus-enable-various-runtime-checks-on-macos-and-fr.patch
- 0008-cirrus-freebsd-run-with-more-CPUs-RAM-and-do-not-rep.patch
- 0009-cirrus-freebsd-run-build-check-in-a-make-vpath.patch
- 0010-cirrus-windows-increase-timeout-to-25min.patch
- 0011-vcregress-add-alltaptests.patch
- 0012-tmp-run-tap-tests-first.patch
- 0013-set-TESTDIR-from-src-test-perl-rather-than-Makefile-.patch
- 0014-s-also-set-PATH.patch
- 0015-f-and-chdir.patch
- 0016-vcregress-run-alltaptests-in-parallel.patch
- 0017-another-way-to-install-uri_regress-testclient.patch
- 0018-Move-libpq_pipeline-test-into-src-interfaces-libpq.patch
- 0019-cirrus-linux-compile-with-fsanitize.patch
- 0020-cirrus-code-coverage.patch
- 0021-cirrus-build-docs-as-a-separate-task.patch
- 0022-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0023-f-html-index-file.patch
- 0024-cirrus-warnings-use-.-configure-cache-in-headerschec.patch
- 0025-cirrus-warnings-move-use-a-single-common-always-bloc.patch
Hi, On 2022-08-28 09:44:47 -0500, Justin Pryzby wrote: > On Sat, May 28, 2022 at 10:37:41AM -0500, Justin Pryzby wrote: > > I'm anticipating the need to further re-arrange the patch set - it's not clear > > which patches should go first. Maybe some patches should be dropped in favour > > of the meson project. I guess some patches will have to be re-implemented with > > meson (msvc warnings). > > > Maybe the improvements to vcregress should go into v15 ? CI should run all the > > tests (which also serves to document *how* to run all the tests, even if there > > isn't a literal check-world target). > > On Thu, Jun 23, 2022 at 02:31:25PM -0500, Justin Pryzby wrote: > > Should any of the test changes go into v15 ? > > On Thu, Jul 07, 2022 at 07:22:32PM -0500, Justin Pryzby wrote: > > Also, cirrus/freebsd task can run 3x faster with more CPUs. > > Checking if there's interest in any/none of these patches ? > I have added several more. > > Do you have an idea when the meson branch might be merged ? I hope to do that fairly soon, but it's of course dependant on review etc. The plan was to merge it early and mature it in tree to some degree. There's only so much we can do "from the outside"... > Will vcregress remain for a while, or will it go away for v16 ? The plan was for the windows stuff to go away fairly quickly. > From 99ee0bef5054539aad0e23a24dd9c9cc9ccee788 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Wed, 25 May 2022 21:53:22 -0500 > Subject: [PATCH 01/25] cirrus/windows: add compiler_warnings_script Looks good. > - MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo > + # -fileLoggerParameters1: write warnings to msbuild.warn.log. > + MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo -fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log Except, I think it'd be good to split this line. What do you think about using something like MSBFLAGS: >- -nologo -m -verbosity:minimal /p:TrackFileAccess=false "-consoleLoggerParameters:Summary;ForceNoAlign" -fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log I think that should work? > # If tests hang forever, cirrus eventually times out. In that case log > # output etc is not uploaded, making the problem hard to debug. Of course > @@ -450,6 +451,11 @@ task: > cd src/tools/msvc > %T_C% perl vcregress.pl ecpgcheck > > + # These should be last, so all the important checks are always run > + always: > + compiler_warnings_script: > + - sh src\tools\ci\windows-compiler-warnings msbuild.warn.log > + > on_failure: > <<: *on_failure > crashlog_artifacts: > diff --git a/src/tools/ci/windows-compiler-warnings b/src/tools/ci/windows-compiler-warnings > new file mode 100755 > index 00000000000..d6f9a1fc569 > --- /dev/null > +++ b/src/tools/ci/windows-compiler-warnings > @@ -0,0 +1,16 @@ > +#! /bin/sh > +# Success if the given file doesn't exist or is empty, else fail > +# This is a separate file only to avoid dealing with windows shell quoting and escaping. > +set -e > + > +fn=$1 > + > +if [ -s "$fn" ] > +then > + # Display the file's content, then exit indicating failure > + cat "$fn" > + exit 1 > +else > + # Success > + exit 0 > +fi > -- > 2.17.1 Wouldn't that be doable as something like sh -c 'if test -s file; then cat file;exit 1; fi" inside .cirrus.yml? > From 1064a0794e85e06b3a0eca4ed3765f078795cb36 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 3 Apr 2022 00:10:20 -0500 > Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats > > ccache since 4.0 enables zstd compression by default. > > With default compression enabled (https://cirrus-ci.com/task/6692342840164352): > linux has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616 > macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500 > freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064 > windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte 51179 > todo: compiler warnings > > With compression disabled (https://cirrus-ci.com/task/4614182514458624): > linux: 91MB cirrus cache; cache_size_kibibyte 316136 > macos: 41MB cirrus cache; cache_size_kibibyte 118068 > windows: 42MB cirrus cache; cache_size_kibibyte 134064 > freebsd is the same > > The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS, > to avoid re-uploading cache tarball in a 100% cached build, due only to > updating ./**/stats). > > Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal. I stared at this commit message for a while, trying to make sense of it, and couldn't really. I assume you're saying that the cirrus compression is better with ccache compression disabled, but it's extremely hard to parse out of it. This does too much at once. Show stats, change cache sizes, disable compression. > From 01e9abd386a4e6cc0125b97617fb42e695898cbf Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Tue, 26 Jul 2022 20:30:02 -0500 > Subject: [PATCH 04/25] cirrus/ccache: add explicit cache keys.. > > Since otherwise, building with ci-os-only will probably fail to use the normal > cache, since the cache key is computed using both the task name and its *index* > in the list of caches (internal/executor/cache.go:184). Hm, perhaps worth confirming and/or reporting to cirrus rather? > From 8de5c977686270b0a4e666a924ebe820a245913a Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 24 Jul 2022 23:09:12 -0500 > Subject: [PATCH 05/25] silence make distprep and generated-headers > > this saves vertical screen space. > > https://www.postgresql.org/message-id/20220221164736.rq3ornzjdkmwk2wo@alap3.anarazel.de I don't feel this should go in as a part of CI changes. Or rather, I feel uncomfortable committing it when just discussed under this subject. > From eaf263ccaa8310c5d9834b97e93ad8434d63296e Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 24 Jul 2022 22:44:53 -0500 > Subject: [PATCH 06/25] pg_regress: run more quietly > > The number of lines of output should be closer to 1 per test, rather than 25 + > 1 per test. > > https://www.postgresql.org/message-id/flat/20220221173109.yl6dqqu3ud52ripd%40alap3.anarazel.de See above. There's also a dedicated thread about revising the output. > From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Fri, 24 Jun 2022 00:09:12 -0500 > Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not > repartitiion > > There was some historic problem where tests under freebsd took 8+ minutes (and > before 4a288a37f took 15 minutes). > > This reduces test time from 10min to 3min. > 4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720 > 4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008 > 4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600 > > 6 CPUs https://cirrus-ci.com/task/6678321684545536 > 8 CPUs https://cirrus-ci.com/task/6264854121021440 > > See also: > https://www.postgresql.org/message-id/flat/20220310033347.hgxk4pyarzq4hxwp@alap3.anarazel.de#f36c0b17e33e31e7925e7e5812998686 > 8 jobs 7min https://cirrus-ci.com/task/6186376667332608 > > xi-os-only: freebsd Typo. > @@ -71,8 +69,6 @@ task: > fingerprint_key: ccache/freebsd > reupload_on_changes: true > > - # Workaround around performance issues due to 32KB block size > - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh > create_user_script: | > pw useradd postgres > chown -R postgres:postgres . > -- What's the story there - at some point that was important for performance because of the native block size triggering significant read-modify-write cycles with postres' writes. You didn't comment on it in the commit message. > From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Wed, 27 Jul 2022 16:54:47 -0500 > Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath > From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 13 Feb 2022 17:56:40 -0600 > Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min No explanation? > From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Fri, 26 Aug 2022 12:00:10 -0500 > Subject: [PATCH 15/25] f!and chdir I don't see the point of pointing fixup commits to the list. Greetings, Andres Freund
On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote: > > --- /dev/null > > +++ b/src/tools/ci/windows-compiler-warnings > > Wouldn't that be doable as something like > sh -c 'if test -s file; then cat file;exit 1; fi" > inside .cirrus.yml? I had written it inline in a couple ways, like - sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi' but then separated it out as you suggested in 20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de after I complained about cmd.exe requiring escaping for && and || That makes writing any shell script a bit perilous and a separate script seems better. > > Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats > > > > ccache since 4.0 enables zstd compression by default. > > > > With default compression enabled (https://cirrus-ci.com/task/6692342840164352): > > linux has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616 > > macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500 > > freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064 > > windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte 51179 > > todo: compiler warnings > > > > With compression disabled (https://cirrus-ci.com/task/4614182514458624): > > linux: 91MB cirrus cache; cache_size_kibibyte 316136 > > macos: 41MB cirrus cache; cache_size_kibibyte 118068 > > windows: 42MB cirrus cache; cache_size_kibibyte 134064 > > freebsd is the same > > > > The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS, > > to avoid re-uploading cache tarball in a 100% cached build, due only to > > updating ./**/stats). > > > > Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal. > > I stared at this commit message for a while, trying to make sense of it, and > couldn't really. I assume you're saying that the cirrus compression is better > with ccache compression disabled, but it's extremely hard to parse out of it. Yes, because ccache uses zstd-1, and cirrus uses gzip, which it's going to use no matter what ccache does, and gzip's default -6 is better than ccache's zstd-1. > This does too much at once. Show stats, change cache sizes, disable > compression. The cache size change is related to the compression level change; ccache prunes based on the local size, which was compressed with zstd-1 and, with this patch, not compressed (so ~2x larger). Also, it's more interesting to control the size uploaded to cirrus (after compression ith gzip-6). > > From 01e9abd386a4e6cc0125b97617fb42e695898cbf Mon Sep 17 00:00:00 2001 > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > Date: Tue, 26 Jul 2022 20:30:02 -0500 > > Subject: [PATCH 04/25] cirrus/ccache: add explicit cache keys.. > > > > Since otherwise, building with ci-os-only will probably fail to use the normal > > cache, since the cache key is computed using both the task name and its *index* > > in the list of caches (internal/executor/cache.go:184). > > Hm, perhaps worth confirming and/or reporting to cirrus rather? I know because of reading their source. Unfortunately, there's no commit history indicating the intent or rationale. https://github.com/cirruslabs/cirrus-ci-agent/blob/master/internal/executor/cache.go#L183 > > From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001 > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > Date: Fri, 24 Jun 2022 00:09:12 -0500 > > Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not > > repartitiion > > > > There was some historic problem where tests under freebsd took 8+ minutes (and > > before 4a288a37f took 15 minutes). > > > > This reduces test time from 10min to 3min. > > 4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720 > > 4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008 > > 4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600 > > > > 6 CPUs https://cirrus-ci.com/task/6678321684545536 > > 8 CPUs https://cirrus-ci.com/task/6264854121021440 > > > > See also: > > https://www.postgresql.org/message-id/flat/20220310033347.hgxk4pyarzq4hxwp@alap3.anarazel.de#f36c0b17e33e31e7925e7e5812998686 > > 8 jobs 7min https://cirrus-ci.com/task/6186376667332608 > > > > xi-os-only: freebsd > > Typo. No - it's deliberate so I can switch to and from "everything" to "this only". > > @@ -71,8 +69,6 @@ task: > > fingerprint_key: ccache/freebsd > > reupload_on_changes: true > > > > - # Workaround around performance issues due to 32KB block size > > - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh > > create_user_script: | > > pw useradd postgres > > chown -R postgres:postgres . > > -- > > What's the story there - at some point that was important for performance > because of the native block size triggering significant read-modify-write > cycles with postres' writes. You didn't comment on it in the commit message. Well, I don't know the history, but it seems to be unneeded now. Is there a good description of the original problem ? Originally, freebsd check-world took ~15min to run tests, and when we changed to use -Og it took 10min. Since then, seems to have improved on its own, and currently takes ~6min. This patch adds CPUs to make it run in ~4min, and takes the opportuity to drop the historic repartition stuff. > > From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001 > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > Date: Wed, 27 Jul 2022 16:54:47 -0500 > > Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath > > > From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001 > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > Date: Sun, 13 Feb 2022 17:56:40 -0600 > > Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min > > No explanation? Because of the immediately following commit which makes it run all the tests. > > From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001 > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > Date: Fri, 26 Aug 2022 12:00:10 -0500 > > Subject: [PATCH 15/25] f!and chdir > > I don't see the point of pointing fixup commits to the list. It's a separate commit to make it easy to see the changes, separately, since I imagine maybe the "chdir" part won't be desirable, or maybe the PATH part won't. But I'm not sure, so I'm here soliciting feedback. -- Justin
Hi, On 2022-08-28 12:10:29 -0500, Justin Pryzby wrote: > On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote: > > > --- /dev/null > > > +++ b/src/tools/ci/windows-compiler-warnings > > > > Wouldn't that be doable as something like > > sh -c 'if test -s file; then cat file;exit 1; fi" > > inside .cirrus.yml? > > I had written it inline in a couple ways, like > - sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi' > > but then separated it out as you suggested in > 20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de > > after I complained about cmd.exe requiring escaping for && and || > That makes writing any shell script a bit perilous and a separate script > seems better. I remember that I suggested it - but note that the way I wrote above doesn't have anything needing escaping. Anyway, what do you think of the multiline split I suggested? > > > Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats > > > > > > ccache since 4.0 enables zstd compression by default. > > > > > > With default compression enabled (https://cirrus-ci.com/task/6692342840164352): > > > linux has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616 > > > macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500 > > > freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064 > > > windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte 51179 > > > todo: compiler warnings > > > > > > With compression disabled (https://cirrus-ci.com/task/4614182514458624): > > > linux: 91MB cirrus cache; cache_size_kibibyte 316136 > > > macos: 41MB cirrus cache; cache_size_kibibyte 118068 > > > windows: 42MB cirrus cache; cache_size_kibibyte 134064 > > > freebsd is the same > > > > > > The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS, > > > to avoid re-uploading cache tarball in a 100% cached build, due only to > > > updating ./**/stats). > > > > > > Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal. > > > > I stared at this commit message for a while, trying to make sense of it, and > > couldn't really. I assume you're saying that the cirrus compression is better > > with ccache compression disabled, but it's extremely hard to parse out of it. > > Yes, because ccache uses zstd-1, and cirrus uses gzip, which it's going > to use no matter what ccache does, and gzip's default -6 is better than > ccache's zstd-1. > > > This does too much at once. Show stats, change cache sizes, disable > > compression. > > The cache size change is related to the compression level change; ccache > prunes based on the local size, which was compressed with zstd-1 and, > with this patch, not compressed (so ~2x larger). Also, it's more > interesting to control the size uploaded to cirrus (after compression > ith gzip-6). That's what should have been in the commit message. > > > From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001 > > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > > Date: Fri, 24 Jun 2022 00:09:12 -0500 > > > Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not > > > repartitiion > > > > > > There was some historic problem where tests under freebsd took 8+ minutes (and > > > before 4a288a37f took 15 minutes). > > > > > > This reduces test time from 10min to 3min. > > > 4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720 > > > 4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008 > > > 4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600 > > > > > > 6 CPUs https://cirrus-ci.com/task/6678321684545536 > > > 8 CPUs https://cirrus-ci.com/task/6264854121021440 > > > > > > See also: > > > https://www.postgresql.org/message-id/flat/20220310033347.hgxk4pyarzq4hxwp@alap3.anarazel.de#f36c0b17e33e31e7925e7e5812998686 > > > 8 jobs 7min https://cirrus-ci.com/task/6186376667332608 > > > > > > xi-os-only: freebsd > > > > Typo. > > No - it's deliberate so I can switch to and from "everything" to "this > only". I don't see the point in posting patches to be applied if they contain lots of such things that a potential committer would need to catch and include a lot of of fixup patches. > > > @@ -71,8 +69,6 @@ task: > > > fingerprint_key: ccache/freebsd > > > reupload_on_changes: true > > > > > > - # Workaround around performance issues due to 32KB block size > > > - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh > > > create_user_script: | > > > pw useradd postgres > > > chown -R postgres:postgres . > > > -- > > > > What's the story there - at some point that was important for performance > > because of the native block size triggering significant read-modify-write > > cycles with postres' writes. You didn't comment on it in the commit message. > > Well, I don't know the history, but it seems to be unneeded now. It's possible it was mainly needed for testing with aio + dio. But also possible that an upgrade improved the situation since. > > > From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001 > > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > > Date: Wed, 27 Jul 2022 16:54:47 -0500 > > > Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath > > > > > From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001 > > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > > Date: Sun, 13 Feb 2022 17:56:40 -0600 > > > Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min > > > > No explanation? > > Because of the immediately following commit which makes it run all the > tests. Mention that in the commit message then. Especially when dealing with 25 commits I don't think you can expect others to infer such things. > > > From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001 > > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > > Date: Fri, 26 Aug 2022 12:00:10 -0500 > > > Subject: [PATCH 15/25] f!and chdir > > > > I don't see the point of pointing fixup commits to the list. > > It's a separate commit to make it easy to see the changes, separately, > since I imagine maybe the "chdir" part won't be desirable, or maybe the > PATH part won't. But I'm not sure, so I'm here soliciting feedback. Shrug, I doubt you'll get much if asked that way. Greetings, Andres Freund
On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote: > On 2022-08-28 12:10:29 -0500, Justin Pryzby wrote: > > On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote: > > > > --- /dev/null > > > > +++ b/src/tools/ci/windows-compiler-warnings > > > > > > Wouldn't that be doable as something like > > > sh -c 'if test -s file; then cat file;exit 1; fi" > > > inside .cirrus.yml? > > > > I had written it inline in a couple ways, like > > - sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi' > > > > but then separated it out as you suggested in > > 20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de > > > > after I complained about cmd.exe requiring escaping for && and || > > That makes writing any shell script a bit perilous and a separate script > > seems better. > > I remember that I suggested it - but note that the way I wrote above doesn't > have anything needing escaping. It doesn't require it, but that still gives the impression that it's normally possible to write one-liner shell scripts there, which is misleading/wrong, and the reason why I took your suggestion to use a separate script file. > Anyway, what do you think of the multiline split I suggested? Done, and sorted. > That's what should have been in the commit message. Sure. I copied into the commit message the explanation that I had written in June's email. > > > > xi-os-only: freebsd > > > > > > Typo. > > > > No - it's deliberate so I can switch to and from "everything" to "this > > only". > > I don't see the point in posting patches to be applied if they contain lots of > such things that a potential committer would need to catch and include a lot > of of fixup patches. I get that you disliked that I disabled the effect of a CI tag by munging "c" to "x". I've amended the message to avoid confusion. But, lots of what such things ? "ci-os-only" would be removed before being pushed anyway. "catching things" is the first part of the review process, which (as I understand) is intended to help patch authors to improve their patches. If you found lots of problems in my patches, I'd need to know about them; but most of what I heard seem like quibbles about the presentation of the patches. It's true that some parts are dirty/unclear, and that seems reasonable for patches most of which haven't yet received review, for which I asked whether to pursue the patch at all, and how best to present them. This is (or could be) an opportunity to make improvements. I renamed the two, related patches to Cluser.pm which said "f!", which are deliberately separate but looked like "fixup" patches. Are you interested in any combination of those three, related changes to move logic from Makefile to perl ? If not, we don't need to debate the merits of spliting the patch. What about the three, related changes for ccache compression ? Should these be dropped in favour of meson ? - cirrus/vcregress: test modules/contrib with NO_INSTALLCHECK=1 - vcregress: add alltaptests I added: "WIP: skip building if only docs have changed" changesInclude() didn't seem to work right when I first tried to use it. Eventually, I realized that it seems to use something like "git log", and not "git diff" (as I'd thought). It seems to work fine now that I know what to expect. git commit --amend --no-edit git diff --stat @{1}..@{0} # this outputs nothing git log --stat @{1}..@{0} # this lists the files changed by the tip commit It'd be nice to be have cfbot inject this patch into each commitfest patch for awhile, to make sure everything works as expected. Same for the code coverage patch and the doc artifacts patch. (These patches currently assume that the base commit is HEAD~1, which is correct for cfbot, and that would also provide code coverage and docs until such time as cfbot is updated to apply and preserve the original series of patches). -- Justin
Attachment
- 0001-cirrus-windows-add-compiler_warnings_script.patch
- 0002-cirrus-vcregress-test-modules-contrib-with-NO_INSTAL.patch
- 0003-cirrus-ccache-disable-compression-and-show-stats.patch
- 0004-cirrus-ccache-add-explicit-cache-keys.patch
- 0005-cirrus-enable-various-runtime-checks-on-macos-and-fr.patch
- 0006-cirrus-freebsd-run-build-check-in-a-make-vpath.patch
- 0007-cirrus-freebsd-run-with-more-CPUs-RAM-and-do-not-rep.patch
- 0008-cirrus-linux-compile-with-fsanitize.patch
- 0009-vcregress-add-alltaptests.patch
- 0010-tmp-run-tap-tests-first.patch
- 0011-set-TESTDIR-from-src-test-perl-rather-than-Makefile-.patch
- 0012-.also-set-PATH.patch
- 0013-.and-chdir.patch
- 0014-vcregress-run-alltaptests-in-parallel.patch
- 0015-another-way-to-install-uri_regress-testclient.patch
- 0016-Move-libpq_pipeline-test-into-src-interfaces-libpq.patch
- 0017-cirrus-code-coverage.patch
- 0018-cirrus-build-docs-as-a-separate-task.patch
- 0019-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0020-html-index-file.patch
- 0021-cirrus-warnings-use-.-configure-cache-in-headerschec.patch
- 0022-cirrus-warnings-move-use-a-single-common-always-bloc.patch
- 0023-WIP-skip-building-if-only-docs-have-changed.patch
On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote: > > > > @@ -71,8 +69,6 @@ task: > > > > fingerprint_key: ccache/freebsd > > > > reupload_on_changes: true > > > > > > > > - # Workaround around performance issues due to 32KB block size > > > > - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh > > > > create_user_script: | > > > > pw useradd postgres > > > > chown -R postgres:postgres . > > > > -- > > > > > > What's the story there - at some point that was important for performance > > > because of the native block size triggering significant read-modify-write > > > cycles with postres' writes. You didn't comment on it in the commit message. > > > > Well, I don't know the history, but it seems to be unneeded now. > > It's possible it was mainly needed for testing with aio + dio. But also > possible that an upgrade improved the situation since. Maybe freebsd got faster as a result of the TAU CPUs? https://mobile.twitter.com/cirrus_labs/status/1534982111568052240 I noticed because it's been *slower* the last ~24h since cirrusci disabled TAU, as Thomas commit mentioned. https://twitter.com/cirrus_labs/status/1572657320093712384 For example this CF entry: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3736 https://cirrus-ci.com/task/4670794365140992 5m36s - 4days ago https://cirrus-ci.com/task/4974926233862144 5m25s - 3days ago https://cirrus-ci.com/task/5561409034518528 5m29s - 2days ago https://cirrus-ci.com/task/6432442008469504 9m19s - yesterday CF_BOT's latest tasks seem to be fast again, since 1-2h ago. https://cirrus-ci.com/build/5178906041909248 9m1s https://cirrus-ci.com/build/4593160281128960 5m8s https://cirrus-ci.com/build/4539845644124160 5m22s The logs for July show when freebsd started "being fast": https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3708 https://cirrus-ci.com/task/6316073015312384 10m25s Jul 13 https://cirrus-ci.com/task/5662878987452416 5m48s Jul 15 Maybe that changed in July rather than June because the TAU CPUs were still not available in every region/zone (?) https://cloud.google.com/compute/docs/regions-zones/ I have no idea if the TAU CPUs eliminate/mitigate the original performance issue you had with AIO. But they have such a large effect on freebsd that it could now be the fastest task, if given more than 2 CPUs. -- Justin
Hi, On 2022-09-22 16:07:02 -0500, Justin Pryzby wrote: > On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote: > > > > > @@ -71,8 +69,6 @@ task: > > > > > fingerprint_key: ccache/freebsd > > > > > reupload_on_changes: true > > > > > > > > > > - # Workaround around performance issues due to 32KB block size > > > > > - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh > > > > > create_user_script: | > > > > > pw useradd postgres > > > > > chown -R postgres:postgres . > > > > > -- > > > > > > > > What's the story there - at some point that was important for performance > > > > because of the native block size triggering significant read-modify-write > > > > cycles with postres' writes. You didn't comment on it in the commit message. > > > > > > Well, I don't know the history, but it seems to be unneeded now. > > > > It's possible it was mainly needed for testing with aio + dio. But also > > possible that an upgrade improved the situation since. > > Maybe freebsd got faster as a result of the TAU CPUs? > https://mobile.twitter.com/cirrus_labs/status/1534982111568052240 > > I noticed because it's been *slower* the last ~24h since cirrusci > disabled TAU, as Thomas commit mentioned. > https://twitter.com/cirrus_labs/status/1572657320093712384 Yea, I noticed that as well. It's entirely possible that something in the "hardware" stack improved sufficiently to avoid problems. > I have no idea if the TAU CPUs eliminate/mitigate the original > performance issue you had with AIO. But they have such a large effect > on freebsd that it could now be the fastest task, if given more than 2 > CPUs. I'm planning to rebase early next week and try that out. Greetings, Andres Freund
On Fri, Sep 23, 2022 at 9:07 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote: > > > > > - # Workaround around performance issues due to 32KB block size > > > > > - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh > > > > > create_user_script: | > > > > > pw useradd postgres > > > > > chown -R postgres:postgres . > > > > > > > > What's the story there - at some point that was important for performance > > > > because of the native block size triggering significant read-modify-write > > > > cycles with postres' writes. You didn't comment on it in the commit message. > > > > > > Well, I don't know the history, but it seems to be unneeded now. > > > > It's possible it was mainly needed for testing with aio + dio. But also > > possible that an upgrade improved the situation since. Yeah, it is very important for direct I/O (patches soon...), because every 8KB random write becomes a read-32KB/wait/write-32KB without it. It's slightly less important for buffered I/O, because the kernel caches hide that, but it still triggers I/O bandwidth amplification, and we definitely saw positive effects earlier on the CI system back on the previous generation. FWIW I am planning to see about getting the FreeBSD installer to create the root file system the way we want, instead of this ugliness. > Maybe freebsd got faster as a result of the TAU CPUs? > [data] Very interesting. Would be good to find the exact instance types + storage types to see if there has been a massive IOPS boost, perhaps via local SSD. The newer times are getting closer to a local developer machine.
Hi, On 2022-09-10 15:05:42 -0500, Justin Pryzby wrote: > From 4ed5eb427de4508a4c3422e60891b45c8512814a Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 3 Apr 2022 00:10:20 -0500 > Subject: [PATCH 03/23] cirrus/ccache: disable compression and show stats > > Since v4.0, ccache enables zstd compression by default, saving roughly > 2x-3x. But, cirrus caches are compressed as tar.gz, so we could disable > ccache compression, allowing cirrus to gzip the uncompressed data > (better than ccache's default of zstd-1). I wonder whether we could instead change CCACHE_COMPRESSLEVEL (maybe 3, zstd's default IIRC). It'd be good if we could increase cache utilization. > From 0bd5f51b8c143ed87a867987309d66b8554b1fd6 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Thu, 14 Apr 2022 06:27:07 -0500 > Subject: [PATCH 05/23] cirrus: enable various runtime checks on macos and > freebsd > > windows is slower than freebsd and mac, so it's okay to enable options which > will slow them down some. Also, the cirrusci mac instances always have lot of > cores available. > See: > https://www.postgresql.org/message-id/20211217193159.pwrelhiyx7kevgsn@alap3.anarazel.de > https://www.postgresql.org/message-id/20211213211223.vkgg3wwiss2tragj%40alap3.anarazel.de > https://www.postgresql.org/message-id/CAH2-WzmevBhKNEtqX3N-Tkb0gVBHH62C0KfeTxXzqYES_PiFiA%40mail.gmail.com > https://www.postgresql.org/message-id/20220325000933.vgazz7pjk2ytj65d@alap3.anarazel.de > > ci-os-only: freebsd, macos > --- > .cirrus.yml | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/.cirrus.yml b/.cirrus.yml > index 183e8746ce6..4ad20892eeb 100644 > --- a/.cirrus.yml > +++ b/.cirrus.yml > @@ -113,7 +113,9 @@ task: > \ > CC="ccache cc" \ > CXX="ccache c++" \ > - CFLAGS="-Og -ggdb" > + CPPFLAGS="-DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST"\ > + CXXFLAGS="-Og -ggdb -march=native -mtune=native" \ > + CFLAGS="-Og -ggdb -march=native -mtune=native" What's reason for -march=native -mtune=native here? > EOF > build_script: | > su postgres -c "ccache --zero-stats" > @@ -336,8 +338,8 @@ task: > CC="ccache cc" \ > CXX="ccache c++" \ > CLANG="ccache ${brewpath}/llvm/bin/ccache" \ > - CFLAGS="-Og -ggdb" \ > - CXXFLAGS="-Og -ggdb" \ > + CFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \ > + CXXFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \ > \ > LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \ > PYTHON=python3 I'd also use CPPFLAGS here, given you'd used it above... I'm planning to commit an updated version of this change soon, without the -march=native -mtune=native bit, unless somebody protests... Greetings, Andres Freund
On Sat, Oct 01, 2022 at 05:45:01PM -0700, Andres Freund wrote: > Hi, > > On 2022-09-10 15:05:42 -0500, Justin Pryzby wrote: > > From 4ed5eb427de4508a4c3422e60891b45c8512814a Mon Sep 17 00:00:00 2001 > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > Date: Sun, 3 Apr 2022 00:10:20 -0500 > > Subject: [PATCH 03/23] cirrus/ccache: disable compression and show stats > > > > Since v4.0, ccache enables zstd compression by default, saving roughly > > 2x-3x. But, cirrus caches are compressed as tar.gz, so we could disable > > ccache compression, allowing cirrus to gzip the uncompressed data > > (better than ccache's default of zstd-1). > > I wonder whether we could instead change CCACHE_COMPRESSLEVEL (maybe 3, zstd's > default IIRC). It'd be good if we could increase cache utilization. I considered that (and I think that's what I wrote initially). I figured that if cirrus is going to use gzip-6 (tar.gz) in any case, we might as well disable compression. Then, all the tasks are also doing the same thing (half the tasks have ccache before 4.0). > > From 0bd5f51b8c143ed87a867987309d66b8554b1fd6 Mon Sep 17 00:00:00 2001 > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > Date: Thu, 14 Apr 2022 06:27:07 -0500 > > Subject: [PATCH 05/23] cirrus: enable various runtime checks on macos and > > freebsd > > > > windows is slower than freebsd and mac, so it's okay to enable options which > > will slow them down some. Also, the cirrusci mac instances always have lot of > > cores available. > > > See: > > https://www.postgresql.org/message-id/20211217193159.pwrelhiyx7kevgsn@alap3.anarazel.de > > https://www.postgresql.org/message-id/20211213211223.vkgg3wwiss2tragj%40alap3.anarazel.de > > https://www.postgresql.org/message-id/CAH2-WzmevBhKNEtqX3N-Tkb0gVBHH62C0KfeTxXzqYES_PiFiA%40mail.gmail.com > > https://www.postgresql.org/message-id/20220325000933.vgazz7pjk2ytj65d@alap3.anarazel.de > > > > ci-os-only: freebsd, macos > > --- > > .cirrus.yml | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/.cirrus.yml b/.cirrus.yml > > index 183e8746ce6..4ad20892eeb 100644 > > --- a/.cirrus.yml > > +++ b/.cirrus.yml > > @@ -113,7 +113,9 @@ task: > > \ > > CC="ccache cc" \ > > CXX="ccache c++" \ > > - CFLAGS="-Og -ggdb" > > + CPPFLAGS="-DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST"\ > > + CXXFLAGS="-Og -ggdb -march=native -mtune=native" \ > > + CFLAGS="-Og -ggdb -march=native -mtune=native" > > What's reason for -march=native -mtune=native here? No particular reason, and my initial patch didn't have it. I suppose I added it to test its effect and never got rid of it. > > EOF > > build_script: | > > su postgres -c "ccache --zero-stats" > > @@ -336,8 +338,8 @@ task: > > CC="ccache cc" \ > > CXX="ccache c++" \ > > CLANG="ccache ${brewpath}/llvm/bin/ccache" \ > > - CFLAGS="-Og -ggdb" \ > > - CXXFLAGS="-Og -ggdb" \ > > + CFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \ > > + CXXFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \ > > \ > > LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \ > > PYTHON=python3 > > I'd also use CPPFLAGS here, given you'd used it above... > > I'm planning to commit an updated version of this change soon, without the > -march=native -mtune=native bit, unless somebody protests... One other thing is that your -m32 changes caused the linux/meson task to take an additional 3+ minutes (total ~8). That's no issue, except that the Warnings task depends on the linux/mason task, and itself can take up to 15 minutes. So those two potentially take as long as the windows task. I suggested that CompileWarnings could instead "Depend on: Freebsd", which currently takes 6-7min (and could take 4-5min if given more CPUs). -- Justin
Hi, On 2022-10-01 19:58:01 -0500, Justin Pryzby wrote: > One other thing is that your -m32 changes caused the linux/meson task to > take an additional 3+ minutes (total ~8). That's no issue, except that > the Warnings task depends on the linux/mason task, and itself can take > up to 15 minutes. > So those two potentially take as long as the windows task. > I suggested that CompileWarnings could instead "Depend on: Freebsd", > which currently takes 6-7min (and could take 4-5min if given more CPUs). I am wondering if we should instead introduce a new "quickcheck" task that just compiles and runs maybe one test and have *all* other tests depend on that. Wasting a precious available windows instance to just fail to build or immediately fail during tests doesn't really make sense. Greetings, Andres Freund
Hi, On 2022-10-01 18:36:41 -0700, Andres Freund wrote: > I am wondering if we should instead introduce a new "quickcheck" task that > just compiles and runs maybe one test and have *all* other tests depend on > that. Wasting a precious available windows instance to just fail to build or > immediately fail during tests doesn't really make sense. Attached is an implementation of that idea. I fairly randomly chose two quick tests to execute as part of the sanity check, cube/regress pg_ctl/001_start_stop. I wanted to have coverage for initdb, a pg_regress style test, a tap test, some other client binary. With a primed cache this takes ~32s, not too bad imo. 12s of that is cloning the repo. What do you think? We could bake a bare repo into the images to make the clone step in faster, but that'd be for later anyway. set -e rm -rf /tmp/pg-clone-better mkdir /tmp/pg-clone-better cd /tmp/pg-clone-better git init --bare git remote add origin https://github.com/postgres/postgres.git --no-tags -t 'REL_*' -t master git fetch -v git repack -ad -f du -sh results in a 227MB repo. git clone https://github.com/anarazel/postgres.git -v --depth 1000 -b ci-sanitycheck --reference /tmp/pg-clone-better /tmp/pg-clone-better-clone clones an example branch in ~1.35s. Greetings, Andres Freund
Attachment
On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote: > Hi, > > On 2022-10-01 18:36:41 -0700, Andres Freund wrote: > > I am wondering if we should instead introduce a new "quickcheck" task that > > just compiles and runs maybe one test and have *all* other tests depend on > > that. Wasting a precious available windows instance to just fail to build or > > immediately fail during tests doesn't really make sense. > With a primed cache this takes ~32s, not too bad imo. 12s of that is > cloning the repo. Maybe - that would avoid waiting 4 minutes for a windows instance to start in the (hopefully atypical) case of a patch that fails in 1-2 minutes under linux/freebsd. If the patch were completely broken, the windows task would take ~4min to start, plus up to ~4min before failing to compile or failing an early test. 6-8 minutes isn't nothing, but doesn't seem worth the added complexity. Also, this would mean that in the common case, the slowest task would be delayed until after the SanityCheck task instance starts, compiles, and runs some test :( Your best case is 32sec, but I doubt that's going to be typical. I was thinking about the idea of cfbot handling "tasks" separately, similar to what it used to do with travis/appveyor. The logic for "windows tasks are only run if linux passes tests" could live there. That could also be useful if there's ever the possibility of running an additional OS on another CI provider, or if another provider can run windows tasks faster, or if we need to reduce our load/dependency on cirrus. I realized that goes backwards in some ways to the direction we've gone with cirrus, and I'm not sure how exactly it would do that (I suppose it might add ci-os-only tags to its commit message). > + # no options enabled, should be small > + CCACHE_MAXSIZE: "150M" Actually, tasks can share caches if the "cache key" is set. If there was a separate "Sanity" task, I think it should use whatever flags linux (or freebsd) use to avoid doing two compilations (with lots of cache misses for patches which modify *.h files, which would then happen twice, in serial). > + # use always: to continue after failures. Task that did not run count as a > + # success, so we need to recheck SanityChecks's condition here ... > - # task that did not run, count as a success, so we need to recheck Linux' > - # condition here ... Another/better justification/description is that "cirrus warns if the depending task has different only_if conditions than the dependant task". -- Justin
Hi, On 2022-10-02 16:35:06 -0500, Justin Pryzby wrote: > Maybe - that would avoid waiting 4 minutes for a windows instance to > start in the (hopefully atypical) case of a patch that fails in 1-2 > minutes under linux/freebsd. > > If the patch were completely broken, the windows task would take ~4min > to start, plus up to ~4min before failing to compile or failing an early > test. 6-8 minutes isn't nothing, but doesn't seem worth the added > complexity. Avoiding 6-8mins of wasted windows time would, I think, allow us to crank cfbot's concurrency up a notch or two. > Also, this would mean that in the common case, the slowest task would be > delayed until after the SanityCheck task instance starts, compiles, and > runs some test :( Your best case is 32sec, but I doubt that's going to > be typical. Even the worst case isn't that bad, the uncached minimal build is 67s. > I was thinking about the idea of cfbot handling "tasks" separately, > similar to what it used to do with travis/appveyor. The logic for > "windows tasks are only run if linux passes tests" could live there. I don't really see the advantage of doing that over just increasing concurrency by a bit. > > + # no options enabled, should be small > > + CCACHE_MAXSIZE: "150M" > > Actually, tasks can share caches if the "cache key" is set. > If there was a separate "Sanity" task, I think it should use whatever > flags linux (or freebsd) use to avoid doing two compilations (with lots > of cache misses for patches which modify *.h files, which would then > happen twice, in serial). I think the price of using exactly the same flags is higher than the gain. And it'll rarely work if we use the container task for the sanity check, as the timestamps of the compiler, system headers etc will be different. > > + # use always: to continue after failures. Task that did not run count as a > > + # success, so we need to recheck SanityChecks's condition here ... > > > - # task that did not run, count as a success, so we need to recheck Linux' > > - # condition here ... > > Another/better justification/description is that "cirrus warns if the > depending task has different only_if conditions than the dependant task". That doesn't really seem easier to understand to me. Greetings, Andres Freund
Hi, On 2022-10-02 16:35:06 -0500, Justin Pryzby wrote: > On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote: > > On 2022-10-01 18:36:41 -0700, Andres Freund wrote: > > > I am wondering if we should instead introduce a new "quickcheck" task that > > > just compiles and runs maybe one test and have *all* other tests depend on > > > that. Wasting a precious available windows instance to just fail to build or > > > immediately fail during tests doesn't really make sense. > > > With a primed cache this takes ~32s, not too bad imo. 12s of that is > > cloning the repo. > > Maybe - that would avoid waiting 4 minutes for a windows instance to > start in the (hopefully atypical) case of a patch that fails in 1-2 > minutes under linux/freebsd. > > If the patch were completely broken, the windows task would take ~4min > to start, plus up to ~4min before failing to compile or failing an early > test. 6-8 minutes isn't nothing, but doesn't seem worth the added > complexity. Btw, the motivation to work on this just now was that I'd like to enable more sanitizers (undefined,alignment for linux-meson, address for linux-autoconf). Yes, we could make the dependency on freebsd instead, but I'd like to try to enable the clang-only memory sanitizer there (if it works on freebsd)... Greetings, Andres Freund
On Sun, Oct 02, 2022 at 02:54:21PM -0700, Andres Freund wrote: > the clang-only memory sanitizer there (if it works on freebsd)... Have you looked at this much ? I think it'll require a bunch of exclusions, right ? -- Justin
On Sat, Sep 10, 2022 at 03:05:42PM -0500, Justin Pryzby wrote: > On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote: > > On 2022-08-28 12:10:29 -0500, Justin Pryzby wrote: > > > On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote: > > > > > --- /dev/null > > > > > +++ b/src/tools/ci/windows-compiler-warnings > > > > > > > > Wouldn't that be doable as something like > > > > sh -c 'if test -s file; then cat file;exit 1; fi" > > > > inside .cirrus.yml? > > > > > > I had written it inline in a couple ways, like > > > - sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi' > > > > > > but then separated it out as you suggested in > > > 20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de > > > > > > after I complained about cmd.exe requiring escaping for && and || > > > That makes writing any shell script a bit perilous and a separate script > > > seems better. > > > > I remember that I suggested it - but note that the way I wrote above doesn't > > have anything needing escaping. > > It doesn't require it, but that still gives the impression that it's > normally possible to write one-liner shell scripts there, which is > misleading/wrong, and the reason why I took your suggestion to use a > separate script file. > > > Anyway, what do you think of the multiline split I suggested? > > Done, and sorted. Rewrote this and rebased some of the other stuff on top of the meson commit, for which I also include some new patches.
Attachment
- 0001-meson-PROVE-is-not-required.patch
- 0002-meson-other-fixes-for-cygwin.patch
- 0003-meson-rename-main-tasks-to-regress-and-isolation.patch
- 0004-cirrus-windows-add-compiler_warnings_script.patch
- 0005-cirrus-build-docs-as-a-separate-task.patch
- 0006-cirrus-ccache-add-explicit-cache-keys.patch
- 0007-cirrus-ccache-disable-compression-and-show-stats.patch
- 0008-cirrus-warnings-use-a-single-common-always-block.patch
Hi, On 2022-11-04 18:54:12 -0500, Justin Pryzby wrote: > Subject: [PATCH 1/8] meson: PROVE is not required > Subject: [PATCH 3/8] meson: rename 'main' tasks to 'regress' and 'isolation' Pushed, thanks for the patches. > Subject: [PATCH 2/8] meson: other fixes for cygwin > > XXX: what about HAVE_BUGGY_STRTOF ? What are you wondering about here? Shouldn't that continue to be set via src/include/port/cygwin.h? > diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build > index 3dcfc11278f..6ec3c77af53 100644 > --- a/src/test/regress/meson.build > +++ b/src/test/regress/meson.build > @@ -10,7 +10,7 @@ regress_sources = pg_regress_c + files( > # patterns like ".*-.*-mingw.*". We probably can do better, but for now just > # replace 'gcc' with 'mingw' on windows. > host_tuple_cc = cc.get_id() > -if host_system == 'windows' and host_tuple_cc == 'gcc' > +if host_system in ['windows', 'cygwin'] and host_tuple_cc == 'gcc' > host_tuple_cc = 'mingw' > endif This doesn't quite seem right - shouldn't it say cywin? Not that it makes a difference right now, given the contents of resultmap: float4:out:.*-.*-cygwin.*=float4-misrounded-input.out float4:out:.*-.*-mingw.*=float4-misrounded-input.out > From 0acbbd2fdd97bbafc5c4552e26f92d52c597eea9 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Wed, 25 May 2022 21:53:22 -0500 > Subject: [PATCH 4/8] cirrus/windows: add compiler_warnings_script > > I'm not sure how to write this test in windows shell; it's also not easy to > write it in posix sh, since windows shell is somehow interpretting && and ||... > > https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de > > See also: > 8a1ce5e54f6d144e4f8e19af7c767b026ee0c956 > https://cirrus-ci.com/task/6241060062494720 > https://cirrus-ci.com/task/6496366607204352 > > ci-os-only: windows > --- > .cirrus.yml | 10 +++++++++- > src/tools/ci/windows-compiler-warnings | 24 ++++++++++++++++++++++++ > 2 files changed, 33 insertions(+), 1 deletion(-) > create mode 100755 src/tools/ci/windows-compiler-warnings > > diff --git a/.cirrus.yml b/.cirrus.yml > index 9f2282471a9..99ac09dc679 100644 > --- a/.cirrus.yml > +++ b/.cirrus.yml > @@ -451,12 +451,20 @@ task: > > build_script: | > vcvarsall x64 > - ninja -C build > + ninja -C build |tee build/meson-logs/build.txt > + REM Since pipes lose exit status of the preceding command, rerun compilation, > + REM without the pipe exiting now if it fails, rather than trying to run checks > + ninja -C build > nul This seems mighty grotty :(. but I guess it's quick enough not worry about, and I can't come up with a better plan. It doesn't seem quite right to redirect into meson-logs/ to me, my interpretation is that that's "meson's namespace". Why not just store it in build/? > From e85fe83fd8a4b4c79a96d2bf66cd6a5e1bdfcd1e Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sat, 26 Feb 2022 19:34:35 -0600 > Subject: [PATCH 5/8] cirrus: build docs as a separate task.. > > This will run the doc build if any docs have changed, even if Linux > fails, to allow catch doc build failures. > > This'll automatically show up as a separate "column" on cfbot. > > Also, in the future, this will hopefully upload each patch's changed HTML docs > as an artifact, for easy review. > > Note that this is currently building docs with both autoconf and meson. > > ci-os-only: html > --- > .cirrus.yml | 62 +++++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 44 insertions(+), 18 deletions(-) > > diff --git a/.cirrus.yml b/.cirrus.yml > index 99ac09dc679..37fd79e5b77 100644 > --- a/.cirrus.yml > +++ b/.cirrus.yml > @@ -472,6 +472,9 @@ task: > type: text/plain > > > +### > +# Test that code can be built with gcc/clang without warnings > +### > task: > name: CompilerWarnings > > @@ -515,10 +518,6 @@ task: > #apt-get update > #DEBIAN_FRONTEND=noninteractive apt-get -y install ... > > - ### > - # Test that code can be built with gcc/clang without warnings > - ### > - Why remove this? > setup_script: echo "COPT=-Werror" > src/Makefile.custom > > # Trace probes have a history of getting accidentally broken. Use the > @@ -580,20 +579,6 @@ task: > make -s -j${BUILD_JOBS} clean > time make -s -j${BUILD_JOBS} world-bin > > - ### > - # Verify docs can be built > - ### > - # XXX: Only do this if there have been changes in doc/ since last build > - always: > - docs_build_script: | > - time ./configure \ > - --cache gcc.cache \ > - CC="ccache gcc" \ > - CXX="ccache g++" \ > - CLANG="ccache clang" > - make -s -j${BUILD_JOBS} clean > - time make -s -j${BUILD_JOBS} -C doc > - > ### > # Verify headerscheck / cpluspluscheck succeed > # > @@ -617,3 +602,44 @@ task: > > always: > upload_caches: ccache > + > + > +### > +# Verify docs can be built > +# changesInclude() will skip this task if none of the commits since > +# CIRRUS_LAST_GREEN_CHANGE touched any relevant files. The comparison appears > +# to be like "git log a..b -- ./file", not "git diff a..b -- ./file" > +### > + > +task: > + name: Documentation > + > + env: > + CPUS: 1 > + BUILD_JOBS: 1 > + > + only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(docs|html).*' > + skip: "!changesInclude('.cirrus.yml', 'doc/**')" Perhaps we should introduce something other than ci-os-only if we want that to include things like "docs and html". At least this should update src/tools/ci/README. > + sysinfo_script: | > + id > + uname -a > + cat /proc/cmdline > + ulimit -a -H && ulimit -a -S > + export I think we can skip this here. > + # Exercise HTML and other docs: > + ninja_docs_build_script: | > + mkdir build.ninja > + cd build.ninja Perhaps build-ninja instead? build.ninja is the filename for ninja's build instructions, so it seems a bit confusing. > From adebe93a4409990e929f2775d45c6613134a4243 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Tue, 26 Jul 2022 20:30:02 -0500 > Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys.. > > Since otherwise, building with ci-os-only will probably fail to use the > normal cache, since the cache key is computed using both the task name > and its *index* in the list of caches (internal/executor/cache.go:184). Seems like this would potentially better addressed by reporting a bug to the cirrus folks? > ccache_cache: > folder: ${CCACHE_DIR} > + fingerprint_key: ccache/linux > + reupload_on_changes: true There's enough copies of this to make it worth deduplicating. If we use something like fingerprint_script: echo ccache/$CIRRUS_BRANCH/$CIRRUS_OS we can use a yaml ref? I think you experimented with creating a 'base' ccache dir (e.g. on the master branch) and then using branch specific secondar caches? How did that turn out? I think cfbot's caches constantly get removed due to overrunning the global space. > From f16739bc5d2087847129baf663aa25fa9edb8449 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 3 Apr 2022 00:10:20 -0500 > Subject: [PATCH 7/8] cirrus/ccache: disable compression and show stats > Since v4.0, ccache enables zstd compression by default, saving roughly > 2x-3x. But, cirrus caches are compressed as tar.gz, so we could disable > ccache compression, allowing cirrus to gzip the uncompressed data > (better than ccache's default of zstd-1). > > With default compression enabled (https://cirrus-ci.com/task/6692342840164352): > linux/debian/bullseye has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616 > macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500 > freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064 > XXX windows has 4.7.2; 180MB cirrus cache; cache_size_kibibyte 51179 > todo: compiler warnings > > With compression disabled (https://cirrus-ci.com/task/4614182514458624): > linux: 91MB cirrus cache; cache_size_kibibyte 316136 > macos: 41MB cirrus cache; cache_size_kibibyte 118068 > windows: 42MB cirrus cache; cache_size_kibibyte 134064 > freebsd is the same I'm still somewhat doubtful this is a good idea. The mingw cache is huge, for example, and all that additional IO and memory usage is bound to show up. > The stats should be shown and/or logged. > ccache --show-stats shows the *cumulative* stats (including prior > compilations) > ccache --zero-stats clears out not only the global stats, but the > per-file cache stats (from which the global stats are derived) - which > obviously makes the cache work poorly. > > Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal. > The log should be written *outside* the ccache folder - it shouldn't be > preserved across cirrusci task invocations. I assume we don't have a new enough ccache everywhere yet? Greetings, Andres Freund
On Fri, Nov 04, 2022 at 06:59:46PM -0700, Andres Freund wrote: > Hi, > > On 2022-11-04 18:54:12 -0500, Justin Pryzby wrote: > > Subject: [PATCH 1/8] meson: PROVE is not required > > Subject: [PATCH 3/8] meson: rename 'main' tasks to 'regress' and 'isolation' > > Pushed, thanks for the patches. Thanks. > > diff --git a/.cirrus.yml b/.cirrus.yml > > index 9f2282471a9..99ac09dc679 100644 > > --- a/.cirrus.yml > > +++ b/.cirrus.yml > > @@ -451,12 +451,20 @@ task: > > > > build_script: | > > vcvarsall x64 > > - ninja -C build > > + ninja -C build |tee build/meson-logs/build.txt > > + REM Since pipes lose exit status of the preceding command, rerun compilation, > > + REM without the pipe exiting now if it fails, rather than trying to run checks > > + ninja -C build > nul > > This seems mighty grotty :(. but I guess it's quick enough not worry about, > and I can't come up with a better plan. > > It doesn't seem quite right to redirect into meson-logs/ to me, my > interpretation is that that's "meson's namespace". Why not just store it in > build/? I put it there so it'd be included with the build artifacts. Maybe it's worth adding a separate line to artifacts for stuff like this, and ccache log ? > > From e85fe83fd8a4b4c79a96d2bf66cd6a5e1bdfcd1e Mon Sep 17 00:00:00 2001 > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > Date: Sat, 26 Feb 2022 19:34:35 -0600 > > Subject: [PATCH 5/8] cirrus: build docs as a separate task.. > > + # Exercise HTML and other docs: > > + ninja_docs_build_script: | > > + mkdir build.ninja > > + cd build.ninja > > Perhaps build-ninja instead? build.ninja is the filename for ninja's build > instructions, so it seems a bit confusing. Sure. Do you think building docs with both autoconf and meson is what's desirable here ? I'm not sure if this ought to be combined with/before/after your "move compilerwarnings task to meson" patch? (Regarding that patch: I mentioned that it shouldn't use ccache -C, and it should use meson_log_artifacts.) > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > Date: Tue, 26 Jul 2022 20:30:02 -0500 > > Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys.. > > > > Since otherwise, building with ci-os-only will probably fail to use the > > normal cache, since the cache key is computed using both the task name > > and its *index* in the list of caches (internal/executor/cache.go:184). > > Seems like this would potentially better addressed by reporting a bug to the > cirrus folks? You said that before, but I don't think so - since they wrote code to do that, it's odd to file a bug that says that the behavior is wrong. I am curious why, but it seems delibrate. https://www.postgresql.org/message-id/20220828171029.GO2342%40telsasoft.com > There's enough copies of this to make it worth deduplicating. If we > use something like fingerprint_script: echo > ccache/$CIRRUS_BRANCH/$CIRRUS_OS we can use a yaml ref? I'll look into it... > I think you experimented with creating a 'base' ccache dir (e.g. on the master > branch) and then using branch specific secondar caches? I have to revisit that sometime. That's a new feaure in ccache 4.4, which is currently only in macos. This is another thing that it'd be easier to test by having cfbot clobber the cirrus.yml rather than committing to postgres repo. (Technically, it should probably only do use the in-testing cirrus.yml if the patch it's testing doesn't itself modify .cirrus.yml) > How did that turn out? I think cfbot's caches constantly get removed > due to overrunning the global space. For cfbot, I don't know if there's much hope that any patch-specific build artifacts will be cached from the previous run, typically ~24h prior. One idea I have, for the "Warnings" task (and maybe linux too), is to defer pruning until after all the compilations. To avoid LRU pruning during early tasks causing bad hit ratios of later tasks. Another thing that probably happens is that task1 starts compiling patch1, and then another instance of task1 starts compiling patch2. A bit later, the first instance will upload its ccache result for patch1, which will be summarily overwritten by the second instance's compilation result, which doesn't include anything from the first instance. Also, whenever ccache hits its MAXSIZE threshold, it prunes the cache down to 80% of the configured size, which probably wipes away everything from all but the most recent ~20 builds. I also thought about having separate caches for each compilation in the warnings task - but that requires too much repeated yaml just for that.. > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > Date: Sun, 3 Apr 2022 00:10:20 -0500 > > Subject: [PATCH 7/8] cirrus/ccache: disable compression and show stats > > > > linux/debian/bullseye has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616 > > macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500 > > freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064 > > XXX windows has 4.7.2; 180MB cirrus cache; cache_size_kibibyte 51179 > > I'm still somewhat doubtful this is a good idea. The mingw cache is huge, for > example, and all that additional IO and memory usage is bound to show up. I think you're referring to the proposed mingw task which runs under windows, and not the existing cross-compilation ? And you're right - I remember this now (I think it's due to PCH?) In my local copy I'd "unset CCACHE_NOCOMPRESS". But I view that as an oddity of windows headers, rather than an argument against disabling compression elsewhere. BTW, freebsd ccache is too old to use compression. What about using CCACHE_HARDLINK (which implies no compression) ? > > Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal. > > The log should be written *outside* the ccache folder - it shouldn't be > > preserved across cirrusci task invocations. > > I assume we don't have a new enough ccache everywhere yet? No - see above. I've added patches to update macos. -- Justin
Attachment
- 0001-cirrus-windows-add-compiler_warnings_script.patch
- 0002-cirrus-build-docs-as-a-separate-task.patch
- 0003-cirrus-ccache-add-explicit-cache-keys.patch
- 0004-cirrus-ccache-avoid-clearing-cache-until-after-all-c.patch
- 0005-cirrus-ccache-disable-compression-and-show-stats.patch
- 0006-cirrus-warnings-use-a-single-common-always-block.patch
- 0007-cirrus-clean-up-windows-task.patch
- 0008-cirrus-switch-to-macos_instance.patch
- 0009-cirrus-update-to-macos-ventura.patch
- 0010-cirrus-freebsd-run-with-more-CPUs-RAM-and-do-not-rep.patch
- 0011-cirrus-split-linux-and-move-the-only_if-line.patch
Hi, On 2022-10-02 14:54:21 -0700, Andres Freund wrote: > On 2022-10-02 16:35:06 -0500, Justin Pryzby wrote: > > On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote: > > > On 2022-10-01 18:36:41 -0700, Andres Freund wrote: > > > > I am wondering if we should instead introduce a new "quickcheck" task that > > > > just compiles and runs maybe one test and have *all* other tests depend on > > > > that. Wasting a precious available windows instance to just fail to build or > > > > immediately fail during tests doesn't really make sense. > > > > > With a primed cache this takes ~32s, not too bad imo. 12s of that is > > > cloning the repo. > > > > Maybe - that would avoid waiting 4 minutes for a windows instance to > > start in the (hopefully atypical) case of a patch that fails in 1-2 > > minutes under linux/freebsd. > > > > If the patch were completely broken, the windows task would take ~4min > > to start, plus up to ~4min before failing to compile or failing an early > > test. 6-8 minutes isn't nothing, but doesn't seem worth the added > > complexity. > > Btw, the motivation to work on this just now was that I'd like to enable more > sanitizers (undefined,alignment for linux-meson, address for > linux-autoconf). Yes, we could make the dependency on freebsd instead, but I'd > like to try to enable the clang-only memory sanitizer there (if it works on > freebsd)... I've used this a bunch on personal branches, and I think it's the way to go. It doesn't take long, saves a lot of cycles when one pushes something broken. Starts to runs the CompilerWarnings task after a minimal amount of sanity checking, instead of having to wait for a task running all tests, without the waste of running it immediately and failing all the different configurations, which takes forever. Greetings, Andres Freund
On Wed, Nov 16, 2022 at 07:48:14PM -0800, Andres Freund wrote: > I've used this a bunch on personal branches, and I think it's the way to > go. It doesn't take long, saves a lot of cycles when one pushes something > broken. Starts to runs the CompilerWarnings task after a minimal amount of > sanity checking, instead of having to wait for a task running all tests, > without the waste of running it immediately and failing all the different > configurations, which takes forever. Well, I don't hate it. But I don't think you should call "ccache -z": On Tue, Oct 18, 2022 at 12:09:30PM -0500, Justin Pryzby wrote: > I realized that ccache -z clears out not only the global stats, but the > per-file cache stats (from which the global stats are derived) - which > obviously makes the cache work poorly. -- Justin
Hi, On 2022-11-16 21:58:39 -0600, Justin Pryzby wrote: > On Wed, Nov 16, 2022 at 07:48:14PM -0800, Andres Freund wrote: > > I've used this a bunch on personal branches, and I think it's the way to > > go. It doesn't take long, saves a lot of cycles when one pushes something > > broken. Starts to runs the CompilerWarnings task after a minimal amount of > > sanity checking, instead of having to wait for a task running all tests, > > without the waste of running it immediately and failing all the different > > configurations, which takes forever. > > Well, I don't hate it. > > But I don't think you should call "ccache -z": Agreed - that was really just for "development" of the task. I also don't like my "cores_script". Not quite sure yet how to do that more cleanly. Greetings, Andres Freund
On Wed, Nov 16, 2022 at 08:08:32PM -0800, Andres Freund wrote: > I also don't like my "cores_script". Not quite sure yet how to do that > more cleanly. I don't know which is cleaner: ls /core* && mv /tmp/core* /tmp/cores find / -maxdepth 1 -type f -name 'core*' -print0 | xargs -r0 mv -vt /tmp/cores for a in /core*; do [ ! -e "$a" ] || mv "$a" /tmp/cores; done
On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote: > > +# To avoid unnecessarily spinning up a lot of VMs / containers for entirely > +# broken commits, have a very minimal test that all others depend on. > +task: > + name: SanityCheck Maybe this should be named 00-SanityCheck, so it sorts first in cfbot ? Also, if CompilerWarnings doesn't depend on Linux, that means those two tasks will normally start and run simultaneously, which means a single branch will use all 8 of the linux CPUs available from cirrus. Is that intentional? -- Justin
Hi, On 2022-11-19 14:22:20 -0600, Justin Pryzby wrote: > On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote: > > > > +# To avoid unnecessarily spinning up a lot of VMs / containers for entirely > > +# broken commits, have a very minimal test that all others depend on. > > +task: > > + name: SanityCheck > > Maybe this should be named 00-SanityCheck, so it sorts first in cfbot ? Hm. Perhaps cfbot could just use the sorting from cirrus? I don't really like the idea of making the names more confusing with numbered prefixes, particularly when only used for some but not other tasks. > Also, if CompilerWarnings doesn't depend on Linux, that means those two > tasks will normally start and run simultaneously, which means a single > branch will use all 8 of the linux CPUs available from cirrus. Is that > intentional? I don't think that'd really make anything worse. But perhaps we could just reduce the CPU count for linux autoconf by 1? I suspect that even with asan enabled it'd still be roughly even with the rest. I'll try to repost a version of the ubsan/asan patch together with the sanitycheck patch and see how that looks. Greetings, Andres Freund
Hi, On 2022-11-19 13:18:54 -0800, Andres Freund wrote: > > Also, if CompilerWarnings doesn't depend on Linux, that means those two > > tasks will normally start and run simultaneously, which means a single > > branch will use all 8 of the linux CPUs available from cirrus. Is that > > intentional? > > I don't think that'd really make anything worse. But perhaps we could just > reduce the CPU count for linux autoconf by 1? I suspect that even with asan > enabled it'd still be roughly even with the rest. Hm, that doesn't suffice, because we allow 4 cores for the warnings task. The limit for cirrus is 16 linux CPUs though, not 8. We'll temporarily go up to 12 due to CompilerWarnings after the change. But I think that's fine, because we'd previously use the same amount of CPUs, just some of it sequentially. From the POV of linux CPUs we'd still be able to start a second task concurrently without delaying the sanitycheck task, and then at max delaying one of the other linux tasks (meson, autoconf, compiler warnings). The limit is, and continues to be, be the number of concurrent macos VMs. Might be better after moving to m1 macs. Greetings, Andres Freund
On Sat, Nov 19, 2022 at 01:18:54PM -0800, Andres Freund wrote: > > Also, if CompilerWarnings doesn't depend on Linux, that means those two > > tasks will normally start and run simultaneously, which means a single > > branch will use all 8 of the linux CPUs available from cirrus. Is that > > intentional? > > I don't think that'd really make anything worse. But perhaps we could just > reduce the CPU count for linux autoconf by 1? I didn't understand the goal of "reducing by one" ? Up to now, most tasks are using half of the available CPUs, which seemed deliberate. Like maybe to allow running two branches simultaneously (that doesn't necessarily work well with ccache, though). On Sat, Nov 19, 2022 at 01:35:17PM -0800, Andres Freund wrote: > The limit for cirrus is 16 linux CPUs though, not 8. Oh. Then I don't see any issue. > We'll temporarily go up to 12 due to CompilerWarnings after the change. What do you mean "temporarily" ? I think you're implying that the Warnings task is fast but (at least right now) it is not. Note that the most recent "code coverage" task is built into the linux-autoconf task, and slows it down some more. That's because it's the only remaining in-tree build, and I aimed to only show coverage for changed files (I know you questioned whether that was okay, but to me it still seems to be valuable, even though it obviously doesn't show changes outside of those files). And I couldn't see how to map from "object filename to source file" with meson, although I guess it's possible with instrospection. I haven't re-sent that patch because it's waiting on cfbot changes. -- Justin
Hi, On 2022-11-19 15:45:06 -0600, Justin Pryzby wrote: > What do you mean "temporarily" ? I think you're implying that the > Warnings task is fast but (at least right now) it is not. In the sense that we don't need all CPUs until the whole commit has finished testing (none of the tasks are the slowest task, even after ubsan/asan). As soon as one of the linux tests has finished for one commit, another task in a concurrently tested commit can start. Whereas that's not the case for macos, due to the VM limit. (cfbot has double the limits, because it has a 10$/mo account) > Note that the most recent "code coverage" task is built into the > linux-autoconf task, and slows it down some more. That's because it's > the only remaining in-tree build, and I aimed to only show coverage for > changed files (I know you questioned whether that was okay, but to me it > still seems to be valuable, even though it obviously doesn't show > changes outside of those files). I think we shouldn't add further tests using autoconf, that'll just mean we'll have to do the work changing that test at some later point. > And I couldn't see how to map from > "object filename to source file" with meson, although I guess it's > possible with instrospection. I haven't re-sent that patch because it's > waiting on cfbot changes. The object files should have that in their metadata, fwiw. Greetings, Andres Freund
Hi, On 2022-11-19 13:18:54 -0800, Andres Freund wrote: > I'll try to repost a version of the ubsan/asan patch together with the > sanitycheck patch and see how that looks. I just pushed the prerequisite patch making UBSAN_OPTIONS work. Attached is 1) addition of SanityCheck 2) use of asan and ubsan+alignment san to the linux tasks. I went with a variation of the find command for SanityCheck's cores_script, but used -exec to invoke mv, as that results in a nicer looking commandline imo. Previously the SanityCheck patch did trigger warnings about only_if not matching, despite SanityCheck not having an only_if, but I reported that as a bug to cirrus-ci, and they fixed that. Pretty happy with this. Greetings, Andres Freund
Attachment
Hi, On 2022-11-13 17:53:04 -0600, Justin Pryzby wrote: > On Fri, Nov 04, 2022 at 06:59:46PM -0700, Andres Freund wrote: > > > diff --git a/.cirrus.yml b/.cirrus.yml > > > index 9f2282471a9..99ac09dc679 100644 > > > --- a/.cirrus.yml > > > +++ b/.cirrus.yml > > > @@ -451,12 +451,20 @@ task: > > > > > > build_script: | > > > vcvarsall x64 > > > - ninja -C build > > > + ninja -C build |tee build/meson-logs/build.txt > > > + REM Since pipes lose exit status of the preceding command, rerun compilation, > > > + REM without the pipe exiting now if it fails, rather than trying to run checks > > > + ninja -C build > nul > > > > This seems mighty grotty :(. but I guess it's quick enough not worry about, > > and I can't come up with a better plan. > > > > It doesn't seem quite right to redirect into meson-logs/ to me, my > > interpretation is that that's "meson's namespace". Why not just store it in > > build/? > > I put it there so it'd be included with the build artifacts. Wouldn't just naming it build-warnings.log suffice? I don't think we want to actually upload build.txt - it already is captured. > > > From e85fe83fd8a4b4c79a96d2bf66cd6a5e1bdfcd1e Mon Sep 17 00:00:00 2001 > > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > > Date: Sat, 26 Feb 2022 19:34:35 -0600 > > > Subject: [PATCH 5/8] cirrus: build docs as a separate task.. > > > > + # Exercise HTML and other docs: > > > + ninja_docs_build_script: | > > > + mkdir build.ninja > > > + cd build.ninja > > > > Perhaps build-ninja instead? build.ninja is the filename for ninja's build > > instructions, so it seems a bit confusing. > > Sure. > > Do you think building docs with both autoconf and meson is what's > desirable here ? Not sure. > I'm not sure if this ought to be combined with/before/after your "move > compilerwarnings task to meson" patch? (Regarding that patch: I > mentioned that it shouldn't use ccache -C, and it should use > meson_log_artifacts.) TBH, I'm not quite sure a separate docs task does really still make sense after the SanityCheck task. It's worth building the docs even if some flappy test fails, but I don't think we should try to build the docs if the code doesn't even compile, in all likelihood a lot more is wrong in that case. > > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > > Date: Tue, 26 Jul 2022 20:30:02 -0500 > > > Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys.. > > > > > > Since otherwise, building with ci-os-only will probably fail to use the > > > normal cache, since the cache key is computed using both the task name > > > and its *index* in the list of caches (internal/executor/cache.go:184). > > > > Seems like this would potentially better addressed by reporting a bug to the > > cirrus folks? > > You said that before, but I don't think so - since they wrote code to do > that, it's odd to file a bug that says that the behavior is wrong. I am > curious why, but it seems delibrate. > > https://www.postgresql.org/message-id/20220828171029.GO2342%40telsasoft.com I suspect this is just about dealing with unnamed tasks and could be handled by just mixing in CI_NODE_INDEX if the task name isn't set. I pushed a version of 0007-cirrus-clean-up-windows-task.patch. I didn't rename the task as I would like to add a msbuild version of the task at some point (it's pretty easy to break msbuild but not ninja unfortunately). In additional I also removed NO_TEMP_INSTALL. Greetings, Andres Freund
On Mon, Nov 21, 2022 at 02:45:42PM -0800, Andres Freund wrote: > > > > + ninja -C build |tee build/meson-logs/build.txt > > > > + REM Since pipes lose exit status of the preceding command, rerun compilation, > > > > + REM without the pipe exiting now if it fails, rather than trying to run checks > > > > + ninja -C build > nul > > > > > > This seems mighty grotty :(. but I guess it's quick enough not worry about, > > > and I can't come up with a better plan. > > > > > > It doesn't seem quite right to redirect into meson-logs/ to me, my > > > interpretation is that that's "meson's namespace". Why not just store it in > > > build/? > > > > I put it there so it'd be included with the build artifacts. > > Wouldn't just naming it build-warnings.log suffice? I don't think we > want to actually upload build.txt - it already is captured. Originally, I wanted the input and the output to be available as files and not just in cirrus' web GUI, but maybe that's not important anymore. I rewrote it again. > > I'm not sure if this ought to be combined with/before/after your "move > > compilerwarnings task to meson" patch? (Regarding that patch: I > > mentioned that it shouldn't use ccache -C, and it should use > > meson_log_artifacts.) > > TBH, I'm not quite sure a separate docs task does really still make > sense after the SanityCheck task. It's worth building the docs even if > some flappy test fails, but I don't think we should try to build the > docs if the code doesn't even compile, in all likelihood a lot more is > wrong in that case. It'd be okay either way. I had split it out to 1) isolate the changes in the "upload changed docs as artifacts" patch; and, 2) so the docs artifacts are visible in a cfbot link called "Documentation"; and, 3) so the docs task runs without a dependency on "Linux", since (as you said) docs/errors are worth showing/reviewing/reporting/addressing separately from test errors (perhaps similar to compiler warnings...). I shuffled my branch around and sending now the current "docs" patches, but I suppose this is waiting on the "convert CompilerWarnings task to meson" patch. -- Justin
Attachment
- 0001-cirrus-windows-add-compiler_warnings_script.patch
- 0002-cirrus-macos-switch-to-macos_instance-M1.patch
- 0003-cirrus-macos-update-to-macos-ventura.patch
- 0004-cirrus-freebsd-run-with-more-CPUs-RAM-and-do-not-rep.patch
- 0005-cirrus-clean-up-typos.patch
- 0006-cirrus-build-docs-as-a-separate-task.patch
- 0007-cirrus-warnings-use-a-single-common-always-block.patch
- 0008-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0009-WIP-show-changed-docs-with-meson.patch
- 0010-html-index-file.patch
On Wed, Nov 23, 2022 at 11:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > [PATCH 02/10] cirrus/macos: switch to "macos_instance" / M1.. Duelling patches. Bilal's patch[1] uses the matrix feature to run the tests on both Intel and ARM, which made sense when he proposed it, but according to Cirrus CI warnings, the Intel instances are about to go away. So I think we just need your smaller change to switch the instance type. As for the pathname change, there is another place that knows where Homebrew lives, in ldap/001_auth. Fixed in the attached. That test just SKIPs if it can't find the binary, making it harder to notice. Standardising our approach here might make sense for a later patch. As for the kerberos test, Bilal's patch may well be a better idea (it adds MacPorts for one thing), and so I'll suggest rebasing that, but here I just wanted the minimum mechanical fix to avoid breaking on the 1st of Jan. I plan to push this soon if there are no objections. Then discussion of Bilal's patch can continue. > [PATCH 03/10] cirrus/macos: update to macos ventura I don't know any reason not to push this one too, but it's not time critical. [1] https://www.postgresql.org/message-id/flat/CAN55FZ2R%2BXufuVgJ8ew_yDBk48PgXEBvyKNvnNdTTVyczbQj0g%40mail.gmail.com
Attachment
On Fri, 30 Dec 2022 at 09:29, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Nov 23, 2022 at 11:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > [PATCH 02/10] cirrus/macos: switch to "macos_instance" / M1.. > > Duelling patches. > > Bilal's patch[1] uses the matrix feature to run the tests on both > Intel and ARM, which made sense when he proposed it, but according to > Cirrus CI warnings, the Intel instances are about to go away. So I > think we just need your smaller change to switch the instance type. > > As for the pathname change, there is another place that knows where > Homebrew lives, in ldap/001_auth. Fixed in the attached. That test > just SKIPs if it can't find the binary, making it harder to notice. > Standardising our approach here might make sense for a later patch. > As for the kerberos test, Bilal's patch may well be a better idea (it > adds MacPorts for one thing), and so I'll suggest rebasing that, but > here I just wanted the minimum mechanical fix to avoid breaking on the > 1st of Jan. > > I plan to push this soon if there are no objections. Then discussion > of Bilal's patch can continue. > > > [PATCH 03/10] cirrus/macos: update to macos ventura > > I don't know any reason not to push this one too, but it's not time critical. The patch does not apply on top of HEAD as in [1], please post a rebased patch: === Applying patches on top of PostgreSQL commit ID e351f85418313e97c203c73181757a007dfda6d0 === === applying patch ./0001-ci-Change-macOS-builds-from-Intel-to-ARM.patch patching file .cirrus.yml Hunk #1 FAILED at 407. Hunk #2 FAILED at 428. Hunk #3 FAILED at 475. 3 out of 3 hunks FAILED -- saving rejects to file .cirrus.yml.rej patching file src/test/kerberos/t/001_auth.pl Hunk #1 FAILED at 32. 1 out of 1 hunk FAILED -- saving rejects to file src/test/kerberos/t/001_auth.pl.rej patching file src/test/ldap/t/001_auth.pl Hunk #1 FAILED at 21. 1 out of 1 hunk FAILED -- saving rejects to file src/test/ldap/t/001_auth.pl.rej [1] - http://cfbot.cputube.org/patch_41_3709.log Regards, Vigneh
On Mon, Nov 21, 2022 at 02:45:42PM -0800, Andres Freund wrote: > On 2022-11-13 17:53:04 -0600, Justin Pryzby wrote: > > > > From: Justin Pryzby <pryzbyj@telsasoft.com> > > > > Date: Tue, 26 Jul 2022 20:30:02 -0500 > > > > Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys.. > > > > > > > > Since otherwise, building with ci-os-only will probably fail to use the > > > > normal cache, since the cache key is computed using both the task name > > > > and its *index* in the list of caches (internal/executor/cache.go:184). > > > > > > Seems like this would potentially better addressed by reporting a bug to the > > > cirrus folks? > > > > You said that before, but I don't think so - since they wrote code to do > > that, it's odd to file a bug that says that the behavior is wrong. I am > > curious why, but it seems delibrate. > > > > https://www.postgresql.org/message-id/20220828171029.GO2342%40telsasoft.com > > I suspect this is just about dealing with unnamed tasks and could be > handled by just mixing in CI_NODE_INDEX if the task name isn't set. I suppose it was their way of dealing with this: |Cache artifacts are shared between tasks, so two caches with the same |name on e.g. Linux containers and macOS VMs will share the same set of |files. This may introduce binary incompatibility between caches. To |avoid that, add echo $CIRRUS_OS into fingerprint_script or use |$CIRRUS_OS in fingerprint_key, which will distinguish caches based on |OS. To make caches work automatically, without having to know to name them differently. -- Justin
On Tue, Nov 22, 2022 at 04:57:44PM -0600, Justin Pryzby wrote: > I shuffled my branch around and sending now the current "docs" patches, > but I suppose this is waiting on the "convert CompilerWarnings task to > meson" patch. In case it's not, here's a version to do that now.
Attachment
- 0001-cirrus-windows-add-compiler_warnings_script.patch
- 0002-cirrus-macos-update-to-macos-ventura.patch
- 0003-cirrus-freebsd-run-with-more-CPUs-RAM-and-do-not-rep.patch
- 0004-cirrus-freebsd-define-ENFORCE_REGRESSION_TEST_NAME_R.patch
- 0005-cirrus-warnings-use-a-single-common-always-block.patch
- 0006-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0007-WIP-ci-meson-allow-showing-only-failed-tests.patch
The autoconf system runs all tap tests in t/*.pl, but meson requires enumerating them in ./meson.build. This checks for and finds no missing tests in the current tree: $ for pl in `find src contrib -path '*/t/*.pl'`; do base=${pl##*/}; dir=${pl%/*}; meson=${dir%/*}/meson.build; grep "$base""$meson" >/dev/null || echo "$base is missing from $meson"; done However, this finds two real problems and one false-positive with missing regress/isolation tests: $ for makefile in `find src contrib -name Makefile`; do for testname in `sed -r '/^(REGRESS|ISOLATION) =/!d; s///; :l; /\\\\$/{s///;N; b l}; s/\n//g' "$makefile"`; do meson=${makefile%/Makefile}/meson.build; grep -Fw "$testname" "$meson" >/dev/null|| echo "$testname is missing from $meson"; done; done guc_privs is missing from src/test/modules/unsafe_tests/meson.build oldextversions is missing from contrib/pg_stat_statements/meson.build $(CF_PGP_TESTS) is missing from contrib/pgcrypto/meson.build I also tried but failed to write something to warn if "meson test" was run with a list of tests but without tmp_install. Help wanted. I propose to put something like this into "SanityCheck". -- Justin
Hi, On 2023-01-17 11:35:09 -0600, Justin Pryzby wrote: > The autoconf system runs all tap tests in t/*.pl, but meson requires > enumerating them in ./meson.build. Yes. It was a mistake that we ever used t/*.pl for make. For one, it means that make can't control concurrency meaningfully, due to the varying number of tests run with one prove instance. It's also the only thing that tied us to prove, which is one hell of a buggy mess. > This checks for and finds no missing tests in the current tree: > > $ for pl in `find src contrib -path '*/t/*.pl'`; do base=${pl##*/}; dir=${pl%/*}; meson=${dir%/*}/meson.build; grep "$base""$meson" >/dev/null || echo "$base is missing from $meson"; done Likely because I do something similar locally. # prep m test --list > /tmp/tests.txt # Check if all tap tests are known to meson for f in $(git ls-files|grep -E '(t|test)/.*.pl$'|sort);do t=$(echo $f|sed -E -e 's/^.*\/([^/]*)\/(t|test)\/(.*)\.pl$/\1\/\3/');grep-q -L $t /tmp/tests.txt |\ | echo $f;done # Check if all regression / isolation tests are known to meson # # Expected to find plpgsql due to extra 'src' directory level, src/test/mb # because it's not run anywhere and sepgsql, because that's not tested yet for d in $(find ~/src/postgresql -type d \( -name sql -or -name specs \) );do t=$(basename $(dirname $d)); grep -q -L $t/tmp/tests.txt || echo $d; done > However, this finds two real problems and one false-positive with > missing regress/isolation tests: Which the above does *not* test for. Good catch. I'll push the fix for those as soon as tests passed on my personal repo. > $ for makefile in `find src contrib -name Makefile`; do for testname in `sed -r '/^(REGRESS|ISOLATION) =/!d; s///; :l;/\\\\$/{s///; N; b l}; s/\n//g' "$makefile"`; do meson=${makefile%/Makefile}/meson.build; grep -Fw "$testname" "$meson">/dev/null || echo "$testname is missing from $meson"; done; done > guc_privs is missing from src/test/modules/unsafe_tests/meson.build Yep. That got added during the development of the meson port, so it's not too surprising. > oldextversions is missing from contrib/pg_stat_statements/meson.build This one however, is odd. Not sure how that happened. > $(CF_PGP_TESTS) is missing from contrib/pgcrypto/meson.build Assume that's the false positive? > I also tried but failed to write something to warn if "meson test" was > run with a list of tests but without tmp_install. Help wanted. That doesn't even catch the worst case - when there's tmp_install, but it's too old. The proper solution would be to make the creation of tmp_install a dependency of the relevant tests. Unfortunately meson still auto-propages those to dependencies of the 'all' target (for historical reasons), and creating the temp install is too slow on some machines to make that tolerable. I think there's an open PR to change that. Once that's in a released meson version that's in somewhat widespread use, we should change that. The other path forward is to allow running the tests without tmp_install. There's not that much we'd need to allow running directly from the source tree - the biggest thing is a way to load extensions from a list of paths. This option is especially attractive because it'd allow to run individual tests without a fully built sourcetree. No need to build other binaries when you just want to test psql, or more extremely, pg_test_timing. > I propose to put something like this into "SanityCheck". Perhaps we instead could add it as a separate "meson-only" test? Then it'd fail on developer's machines, instead of later in CI. We could pass the test information from the 'tests' array, or it could look at the metadata in meson-info/intro-tests.json Greetings, Andres Freund
On Tue, Jan 17, 2023 at 11:56:42AM -0800, Andres Freund wrote: > > $(CF_PGP_TESTS) is missing from contrib/pgcrypto/meson.build > > Assume that's the false positive? Yes > > I also tried but failed to write something to warn if "meson test" was > > run with a list of tests but without tmp_install. Help wanted. > > That doesn't even catch the worst case - when there's tmp_install, but it's > too old. I don't understand what you mean by "too old" ? > > I propose to put something like this into "SanityCheck". > > Perhaps we instead could add it as a separate "meson-only" test? Then it'd > fail on developer's machines, instead of later in CI. We could pass the test > information from the 'tests' array, or it could look at the metadata in > meson-info/intro-tests.json I guess you mean that it should be *able* to fail on developer machines *in addition* to cirrusci. But, a meson-only test might not be so helpful, as it assumes that the developer is using meson, in which case the problem would tend not to have occured in the first place. BTW I also noticed that: meson.build:meson_binpath_r = run_command(python, 'src/tools/find_meson', check: true) meson.build- meson.build-if meson_binpath_r.returncode() != 0 or meson_binpath_r.stdout() == '' meson.build- error('huh, could not run find_meson.\nerrcode: @0@\nstdout: @1@\nstderr: @2@'.format( The return code will never be nonzero since check==true, right ? -- Justin
On Fri, Dec 30, 2022 at 4:59 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Nov 23, 2022 at 11:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > [PATCH 03/10] cirrus/macos: update to macos ventura > > I don't know any reason not to push this one too, but it's not time critical. Some observations: * macOS has a new release every year in June[1] * updates cease after three years[1] * thus three releases are in support (by that definition) at a time * we need an image on Cirrus; 13 appeared ~1 month later[2] * we need Homebrew support; 13 appeared ~3 months later[3] * we have 13 and 12 in the buildfarm, but no 11 * it's common for developers but uncommon for servers/deployment So what should our policy be on when to roll the CI image forward? I guess around New Year/now (~6 months after release) is a good time and we should just do it. Anyone got a reason why we should wait? Our other CI OSes have slower major version release cycles and longer lives, so it's not quite the same hamster wheel of upgrades. [1] https://en.wikipedia.org/wiki/MacOS_version_history#Releases [2] https://github.com/orgs/cirruslabs/packages?tab=packages&q=macos [3] https://brew.sh/2022/09/07/homebrew-3.6.0/
Thomas Munro <thomas.munro@gmail.com> writes: > Some observations: > * macOS has a new release every year in June[1] > * updates cease after three years[1] > * thus three releases are in support (by that definition) at a time > * we need an image on Cirrus; 13 appeared ~1 month later[2] > * we need Homebrew support; 13 appeared ~3 months later[3] > * we have 13 and 12 in the buildfarm, but no 11 > * it's common for developers but uncommon for servers/deployment > So what should our policy be on when to roll the CI image forward? I > guess around New Year/now (~6 months after release) is a good time and > we should just do it. Anyone got a reason why we should wait? Our > other CI OSes have slower major version release cycles and longer > lives, so it's not quite the same hamster wheel of upgrades. I'd argue that developers are probably the kind of people who update their OS sooner rather than later --- I've usually updated my laptop and at least one BF animal to $latest macOS within a month or so of the dot-zero release. So waiting 6 months seems to me like CI will be behind the users, which will be unhelpful. I'd rather drop the oldest release sooner, if we need to hold down the number of macOS revisions under test. regards, tom lane
On Thu, Feb 2, 2023 at 2:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Some observations: > > So what should our policy be on when to roll the CI image forward? I > > guess around New Year/now (~6 months after release) is a good time and > > we should just do it. Anyone got a reason why we should wait? Our > > other CI OSes have slower major version release cycles and longer > > lives, so it's not quite the same hamster wheel of upgrades. > > I'd argue that developers are probably the kind of people who update > their OS sooner rather than later --- I've usually updated my laptop > and at least one BF animal to $latest macOS within a month or so of > the dot-zero release. So waiting 6 months seems to me like CI will be > behind the users, which will be unhelpful. I'd rather drop the oldest > release sooner, if we need to hold down the number of macOS revisions > under test. Cool. Done. Out of curiosity, I wondered how the "graphical installer" packagers like EDB and Postgres.app choose a target, when Apple is moving so fast. I see that the current EDB installers target 10.14 for PG15, which was 5 years old at initial release, and thus already EOL'd for 2 years. Postgres.app goes back one more year. In other words, even though that preadv/pwritev "decl" stuff is unnecessary for PG16 if you think we should only target OSes that the vendor still supports (which will be 12, 13, 14), someone would still shout at me if I removed it.
rebased, and re-including a patch to show code coverage of changed files. a5b3e50d922 cirrus/windows: add compiler_warnings_script 4c98dcb0e03 cirrus/freebsd: run with more CPUs+RAM and do not repartition aaeef938ed4 cirrus/freebsd: define ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS 9baf41674ad pg_upgrade: tap test: exercise --link and --clone 7e09035f588 WIP: ci/meson: allow showing only failed tests .. e4534821ef5 cirrus/ccache: use G rather than GB suffix.. 185d1c3ed13 cirrus: code coverage 5dace84a038 cirrus: upload changed html docs as artifacts 852360330ef +html index file
Attachment
- 0001-cirrus-windows-add-compiler_warnings_script.patch
- 0002-cirrus-freebsd-run-with-more-CPUs-RAM-and-do-not-rep.patch
- 0003-cirrus-freebsd-define-ENFORCE_REGRESSION_TEST_NAME_R.patch
- 0004-pg_upgrade-tap-test-exercise-link-and-clone.patch
- 0005-WIP-ci-meson-allow-showing-only-failed-tests.patch
- 0006-cirrus-ccache-use-G-rather-than-GB-suffix.patch
- 0007-cirrus-code-coverage.patch
- 0008-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0009-html-index-file.patch
On 03.02.23 15:26, Justin Pryzby wrote: > rebased, and re-including a patch to show code coverage of changed > files. This constant flow of patches under one subject doesn't lend itself well to the commit fest model of trying to finish things up. I can't quite tell which of these patches are ready and agreed upon, and which ones are work in progress or experimental. > e4534821ef5 cirrus/ccache: use G rather than GB suffix.. This one seems obvious. I have committed it. > 9baf41674ad pg_upgrade: tap test: exercise --link and --clone This seems like a good idea. > 7e09035f588 WIP: ci/meson: allow showing only failed tests .. I'm not sure I like this one. I sometimes look up the logs of non-failed tests to compare them with failed tests, to get context to could lead to failures. Maybe we can make this behavior adjustable. But I've not been bothered by the current behavior.
On Mon, Mar 13, 2023 at 07:39:52AM +0100, Peter Eisentraut wrote: > On 03.02.23 15:26, Justin Pryzby wrote: > > rebased, and re-including a patch to show code coverage of changed > > files. > > This constant flow of patches under one subject doesn't lend itself well to > the commit fest model of trying to finish things up. > I can't quite tell which of these patches are ready and agreed upon, > and which ones are work in progress or experimental. I'm soliticing feedback on those patches that I've sent recently - I've elided patches if they have some unresolved issue. I'm not aware of any loose ends other than what's updated here: - cirrus: code coverage I changed this to also run an "initial" coverage report before running tests. It's not clear to me what effect that has, though... Andres seems to think it's a problem that this shows coverage only for files that were actually changed. But that's what's intended; it's sufficient to see if new code is being hit by tests. It would be slow and take a lot of extra space to upload a coverage report for every patch, every day. It might be nice for cfbot to show how test coverage changed in the affected files: -15% / +25%. - cirrus: upload changed html docs as artifacts Fixed an "only_if" line so cfbot will run the "warnings" task. Maybe this path is waiting on Andres' patch to "move CompilerWarnings to meson" ? > > 7e09035f588 WIP: ci/meson: allow showing only failed tests .. > > I'm not sure I like this one. I sometimes look up the logs of non-failed > tests to compare them with failed tests, to get context to could lead to > failures. Maybe we can make this behavior adjustable. But I've not been > bothered by the current behavior. It's adjustable by un/setting the environment variable. I'm surprised to hear that anyone using cirrusci (with or without cfbot) wouldn't prefer the behavior this patch implements. It's annoying to search find the logs for the (typically exactly one) failing test in cirrus' directory of 200some test artifacts. We're also uploading a lot of logs for every failure. (But I suppose this might break cfbot's new client side parsing of things like build logs...) -- Justin
Attachment
- 0001-cirrus-windows-add-compiler_warnings_script.patch
- 0002-cirrus-freebsd-run-with-more-CPUs-RAM-and-do-not-rep.patch
- 0003-cirrus-freebsd-define-ENFORCE_REGRESSION_TEST_NAME_R.patch
- 0004-pg_upgrade-tap-test-exercise-link-and-clone.patch
- 0005-WIP-ci-meson-allow-showing-only-failed-tests.patch
- 0006-cirrus-code-coverage.patch
- 0007-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0008-html-index-file.patch
On 14.03.23 05:56, Justin Pryzby wrote: > I'm soliticing feedback on those patches that I've sent recently - I've > elided patches if they have some unresolved issue. > [PATCH 1/8] cirrus/windows: add compiler_warnings_script Needs a better description of what it actually does. (And fewer "I'm not sure how to write this ..." comments ;-) ) It looks like it would fail the build if there is a compiler warning in the Windows VS task? Shouldn't that be done in the CompilerWarnings task? Also, I see a bunch of warnings in the current output from that task. These should be cleaned up in any case before we can let a thing like this loose. (The warnings are all like C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': macro redefinition so possibly a single fix can address them all.) > [PATCH 2/8] cirrus/freebsd: run with more CPUs+RAM and do not repartition I don't know enough about this. Maybe Andres or Thomas want to take this. No concerns if it's safe. > [PATCH 3/8] cirrus/freebsd: define ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS Looks sensible. > [PATCH 4/8] pg_upgrade: tap test: exercise --link and --clone I haven't been able to get any changes to the test run times outside of noise from this. But some more coverage is sensible in any case. I'm concerned that with this change, the only platform that tests --copy is Windows, but Windows has a separate code path for copy. So we should leave one Unix platform to test --copy. Maybe have FreeBSD test --link and macOS test --clone and leave the others with --copy? > [PATCH 5/8] WIP: ci/meson: allow showing only failed tests .. >>> 7e09035f588 WIP: ci/meson: allow showing only failed tests .. >> >> I'm not sure I like this one. I sometimes look up the logs of non-failed >> tests to compare them with failed tests, to get context to could lead to >> failures. Maybe we can make this behavior adjustable. But I've not been >> bothered by the current behavior. > > It's adjustable by un/setting the environment variable. > > I'm surprised to hear that anyone using cirrusci (with or without cfbot) > wouldn't prefer the behavior this patch implements. It's annoying to > search find the logs for the (typically exactly one) failing test in > cirrus' directory of 200some test artifacts. We're also uploading a lot > of logs for every failure. (But I suppose this might break cfbot's new > client side parsing of things like build logs...) One thing that actually annoys me that is that a successful run does not upload any test artifacts at all. So, I guess I'm just of a different opinion here. > [PATCH 6/8] cirrus: code coverage This adds -Db_coverage=true to the FreeBSD task. This has a significant impact on the build time. (+50% at least, it appears.) I'm not sure the approach here makes sense. For example, if you add a new test, the set of changed files is just that test. So you won't get any report on what coverage change the test has caused. Also, I don't think I trust the numbers from the meson coverage stuff yet. See for example <https://www.postgresql.org/message-id/Y/3AI+/MqKcjLk/T@paquier.xyz>. > [PATCH 7/8] cirrus: upload changed html docs as artifacts > [PATCH 8/8] +html index file This builds the docs twice and then analyzes the differences between the two builds. This also affects the build times quite significantly. How useful is this actually? People who want to look at the docs can build them locally. There are no platform dependencies or anything like that where having them built elsewhere is of advantage.
On Wed, Mar 15, 2023 at 10:58:41AM +0100, Peter Eisentraut wrote: > On 14.03.23 05:56, Justin Pryzby wrote: > > I'm soliticing feedback on those patches that I've sent recently - I've > > elided patches if they have some unresolved issue. > > > [PATCH 1/8] cirrus/windows: add compiler_warnings_script > > Needs a better description of what it actually does. (And fewer "I'm not > sure how to write this ..." comments ;-) ) It looks like it would fail the > build if there is a compiler warning in the Windows VS task? Shouldn't that > be done in the CompilerWarnings task? The goal is to fail due to warnings only after running tests. https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de "Probably worth scripting something to make the windows task error out if there had been warnings, but only after running the tests." CompilerWarnings runs in a linux environment running with -Werror. This patch scrapes warnings out of MSVC, since (at least historically) it's too slow to run a separate windows VM to compile with -Werror. > Also, I see a bunch of warnings in the current output from that task. These > should be cleaned up in any case before we can let a thing like this loose. Yeah (and I mentioned those myself). As it stands, my patch also "breaks" everytime someone's else's patch introduces warnings. I included links demonstrating its failures. I agree that it's not okay to merge the patch when it's currently failing, but I cannot dig into that other issue right now. > > [PATCH 6/8] cirrus: code coverage > > This adds -Db_coverage=true to the FreeBSD task. This has a significant > impact on the build time. (+50% at least, it appears.) Yes - but with the CPUs added by the prior patch, the freebsd task is faster than it is currently. And its 8min runtime would match the other tasks well. > I'm not sure the approach here makes sense. For example, if you add a new > test, the set of changed files is just that test. So you won't get any > report on what coverage change the test has caused. The coverage report that I proposed clearly doesn't handle that case - it's not intended to. Showing a full coverage report is somewhat slow to generate, probably unreasonable to upload for every patch, every day, and not very interesting since it's at least 99% duplicative. The goal is to show a coverage report for new code for every patch. What fraction of the time do you think the patch author, reviewer or committer have looked at a coverage report? It's not a question of whether it's possible to do so locally, but of whether it's actually done. > Also, I don't think I trust the numbers from the meson coverage stuff yet. > See for example > <https://www.postgresql.org/message-id/Y/3AI+/MqKcjLk/T@paquier.xyz>. I'm not using the meson coverage target. I could instead add CFLAGS=--coverage. Anyway, getting a scalar value like "83%" might be interesting to show in cfbot, but it's not the main goal here. > > [PATCH 7/8] cirrus: upload changed html docs as artifacts > > [PATCH 8/8] +html index file > > This builds the docs twice and then analyzes the differences between the two > builds. This also affects the build times quite significantly. The main goal is to upload the changed docs. > People who want to look at the docs can build them locally. This makes the docs for every patch available for reviewers, without needing a build environment. An easy goal would be if documentation for every patch was reviewed by a native english speaker. Right now that's not consistently true. > How useful is this actually? I'm surprised if there's any question about the merits of making documentation easily available for review. Several people have agreed; one person mailed me privately specifically to ask how to show HTML docs on cirrusci. Anyway, all this stuff is best addressed either before or after the CF. I'll kick the patch forward. Thanks for looking. -- Justin
On 15.03.23 15:56, Justin Pryzby wrote: > I'm surprised if there's any question about the merits of making > documentation easily available for review. Several people have agreed; > one person mailed me privately specifically to ask how to show HTML docs > on cirrusci. > > Anyway, all this stuff is best addressed either before or after the CF. > I'll kick the patch forward. Thanks for looking. I suppose this depends on what you want to use this for. If your use is to prepare and lay out as much information as possible about a patch for a reviewer, some of your ideas make sense. I'm using this primarily to quickly test local work in progress. So I want a quick feedback cycle. I don't need it to show me which HTML docs changed, for example. So maybe there need to be different modes.
On Wed, Mar 15, 2023 at 04:57:34PM +0100, Peter Eisentraut wrote: > On 15.03.23 15:56, Justin Pryzby wrote: > > I'm surprised if there's any question about the merits of making > > documentation easily available for review. Several people have agreed; > > one person mailed me privately specifically to ask how to show HTML docs > > on cirrusci. > > > > Anyway, all this stuff is best addressed either before or after the CF. > > I'll kick the patch forward. Thanks for looking. > > I suppose this depends on what you want to use this for. If your use is to > prepare and lay out as much information as possible about a patch for a > reviewer, some of your ideas make sense. > > I'm using this primarily to quickly test local work in progress. So I want > a quick feedback cycle. I don't need it to show me which HTML docs changed, > for example. > > So maybe there need to be different modes. I'm opened to that - for example, mingw is currently opt-in. Maybe this should be a separate task - it was implemented like that based on an earlier suggestion (and then changed back again based on another suggestion). The task could be triggered manually or by cfbot's message. But a primary goal for cirrus.yml was to allow developers to do the same things as cfbot, and without everyone needing to reimplement it for themselves. You want quick feedback, like everyone else - but I doubt you disable the documentation build when you don't need it, even though that would shave off a whole minute. And I doubt that you'd comment it out even the documentation was built twice. Anyway - I think this patch is probably waiting on Andres' patch to "convert CompilerWarnings to meson". > > 7e09035f588 WIP: ci/meson: allow showing only failed tests .. > > I'm not sure I like this one. I sometimes look up the logs of non-failed > tests to compare them with failed tests, to get context to could lead to > failures. Maybe we can make this behavior adjustable. But I've not been > bothered by the current behavior. I suggest to try the patch; I doubt you'd prefer the existing behavior. The patch is rebased now that meson is updated to avoid the windows python warnings (thanks Andres). -- Justin
Attachment
- 0001-cirrus-windows-add-compiler_warnings_script.patch
- 0002-cirrus-freebsd-run-with-more-CPUs-RAM-and-do-not-rep.patch
- 0003-cirrus-freebsd-define-ENFORCE_REGRESSION_TEST_NAME_R.patch
- 0004-cirrus-002_pg_upgrade-exercise-link-and-clone.patch
- 0005-WIP-ci-meson-allow-showing-only-failed-tests.patch
- 0006-cirrus-show-coverage-report-of-new-code-for-every-pa.patch
- 0007-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0008-html-index-file.patch
On 12.04.23 03:05, Justin Pryzby wrote: > The patch is rebased now that meson is updated to avoid the windows > python warnings (thanks Andres). To keep this moving along, I have committed [PATCH 3/8] cirrus/freebsd: define ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
On Tue, Jan 17, 2023 at 11:35:09AM -0600, Justin Pryzby wrote: > However, this finds two real problems and one false-positive with > missing regress/isolation tests: > > $ for makefile in `find src contrib -name Makefile`; do for testname in `sed -r '/^(REGRESS|ISOLATION) =/!d; s///; :l;/\\\\$/{s///; N; b l}; s/\n//g' "$makefile"`; do meson=${makefile%/Makefile}/meson.build; grep -Fw "$testname" "$meson">/dev/null || echo "$testname is missing from $meson"; done; done And, since 681d9e462: security is missing from contrib/seg/meson.build
On Wed, Jul 12, 2023 at 12:56:17AM -0500, Justin Pryzby wrote: > And, since 681d9e462: > > security is missing from contrib/seg/meson.build Ugh. Good catch! -- Michael
Attachment
On Wed, 12 Jul 2023 at 11:38, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jul 12, 2023 at 12:56:17AM -0500, Justin Pryzby wrote: > > And, since 681d9e462: > > > > security is missing from contrib/seg/meson.build > > Ugh. Good catch! Are we planning to do anything more in this thread, the thread has been idle for more than 7 months. If nothing more is planned for this, I'm planning to close this commitfest entry in this commitfest. Regards, Vignesh
On Wed, Jan 17, 2024 at 05:34:00PM +0530, vignesh C wrote: > Are we planning to do anything more in this thread, the thread has > been idle for more than 7 months. If nothing more is planned for this, > I'm planning to close this commitfest entry in this commitfest. Oops, this went through the cracks. security was still missing in seg's meson.build, so I've just applied a patch to take care of it. I am not spotting any other holes.. -- Michael
Attachment
On Thu, 18 Jan 2024 at 06:46, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jan 17, 2024 at 05:34:00PM +0530, vignesh C wrote: > > Are we planning to do anything more in this thread, the thread has > > been idle for more than 7 months. If nothing more is planned for this, > > I'm planning to close this commitfest entry in this commitfest. > > Oops, this went through the cracks. security was still missing in > seg's meson.build, so I've just applied a patch to take care of it. > I am not spotting any other holes.. Are we planning to do anything more on this? I was not sure if we should move this to next commitfest or return it. Regards, Vignesh
On 2024-Jan-31, vignesh C wrote: > Are we planning to do anything more on this? I was not sure if we > should move this to next commitfest or return it. Well, the patches don't apply anymore since .cirrus.tasks.yml has been created. However, I'm sure we still want [some of] the improvements to the tests in [1]. I can volunteer to rebase the patches in time for the March commitfest, if Justin is not available to do so. If you can please move it forward to the March cf and set it WoA, I'd appreciate that. Thanks [1] https://postgr.es/m/ZA/+mKDX9zWfhD3v@telsasoft.com -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Tiene valor aquel que admite que es un cobarde" (Fernandel)
On Wed, Jan 31, 2024 at 11:59:21AM +0100, Alvaro Herrera wrote: > On 2024-Jan-31, vignesh C wrote: > > > Are we planning to do anything more on this? I was not sure if we > > should move this to next commitfest or return it. > > Well, the patches don't apply anymore since .cirrus.tasks.yml has been > created. However, I'm sure we still want [some of] the improvements > to the tests in [1]. I can volunteer to rebase the patches in time for the > March commitfest, if Justin is not available to do so. If you can > please move it forward to the March cf and set it WoA, I'd appreciate > that. The patches are rebased. A couple were merged since I last rebased them ~10 months ago. The freebsd patch will probably be obsoleted by a patch of Thomas. On Mon, Mar 13, 2023 at 07:39:52AM +0100, Peter Eisentraut wrote: > On 03.02.23 15:26, Justin Pryzby wrote: > > 9baf41674ad pg_upgrade: tap test: exercise --link and --clone > > This seems like a good idea. On Wed, Mar 15, 2023 at 10:58:41AM +0100, Peter Eisentraut wrote: > > [PATCH 4/8] pg_upgrade: tap test: exercise --link and --clone > > I haven't been able to get any changes to the test run times outside > of noise from this. But some more coverage is sensible in any case. > > I'm concerned that with this change, the only platform that tests > --copy is Windows, but Windows has a separate code path for copy. So > we should leave one Unix platform to test --copy. Maybe have FreeBSD > test --link and macOS test --clone and leave the others with --copy? I addressed Peter's comments, but haven't heard further. The patch to show HTML docs artifacts may be waiting for Andres' patch to convert CompilerWarnings to meson. It may also be waiting on cfbot to avoid squishing all the patches together. I sent various patches to cfbot but haven't heard back. https://www.postgresql.org/message-id/flat/20220409021853.GP24419@telsasoft.com https://www.postgresql.org/message-id/flat/20220623193125.GB22452@telsasoft.com https://github.com/justinpryzby/cfbot/commits/master https://github.com/macdice/cfbot/pulls -- Justin
Attachment
On 13.02.24 20:10, Justin Pryzby wrote: > On Mon, Mar 13, 2023 at 07:39:52AM +0100, Peter Eisentraut wrote: >> On 03.02.23 15:26, Justin Pryzby wrote: >>> 9baf41674ad pg_upgrade: tap test: exercise --link and --clone >> This seems like a good idea. > On Wed, Mar 15, 2023 at 10:58:41AM +0100, Peter Eisentraut wrote: >>> [PATCH 4/8] pg_upgrade: tap test: exercise --link and --clone >> I haven't been able to get any changes to the test run times outside >> of noise from this. But some more coverage is sensible in any case. >> >> I'm concerned that with this change, the only platform that tests >> --copy is Windows, but Windows has a separate code path for copy. So >> we should leave one Unix platform to test --copy. Maybe have FreeBSD >> test --link and macOS test --clone and leave the others with --copy? > I addressed Peter's comments, but haven't heard further. Ok, I didn't see that my feedback had been addressed. I have committed this patch.
> On 19 Feb 2024, at 11:33, Peter Eisentraut <peter@eisentraut.org> wrote: > > Ok, I didn't see that my feedback had been addressed. I have committed this patch. Justin, Peter, I can't determine actual status of the CF entry [0]. May I ask someone of you to move patch to next CF orclose as committed? Thanks! Best regards, Andrey Borodin. [0] https://commitfest.postgresql.org/47/3709/
On Mon, Apr 08, 2024 at 05:54:10PM +0300, Andrey M. Borodin wrote: > Justin, Peter, I can't determine actual status of the CF entry > [0]. May I ask someone of you to move patch to next CF or close as > committed? 0002 is the only thing committed as of 21a71648d39f. I can see the value in 0001, but the implementation feels awkward. 0003 is wanted. I am personally not sure about 0004 to upload doc artifacts. Similarly. 0005 can already be done with a few clicks on the CI, and the previous run may not be the only one that matters. 0006 makes the doc check phase more complex. In all that, 0003 is something that we should move on with, at least. Moving this entry to the next CF makes sense to me now, to give more time to the other patches, and there's value to be extracted at quick glance. -- Michael
Attachment
On Fri, Jun 24, 2022 at 08:38:50AM +1200, Thomas Munro wrote: > > I've also sent some patches to Thomas for cfbot to help progress some of these > > patches (code coverage and documentation upload as artifacts). > > https://github.com/justinpryzby/cfbot/commits/master > > Thanks, sorry for lack of action, will get to these soon. On Tue, Feb 13, 2024 at 01:10:21PM -0600, Justin Pryzby wrote: > I sent various patches to cfbot but haven't heard back. > https://www.postgresql.org/message-id/flat/20220409021853.GP24419@telsasoft.com > https://www.postgresql.org/message-id/flat/20220623193125.GB22452@telsasoft.com > https://github.com/justinpryzby/cfbot/commits/master > https://github.com/macdice/cfbot/pulls @Thomas: ping I reintroduced the patch for ccache/windows -- v4.10 supports PCH, which can make the builds 2x faster.
Attachment
Hi, Thanks for working on this! On Wed, 12 Jun 2024 at 16:10, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Fri, Jun 24, 2022 at 08:38:50AM +1200, Thomas Munro wrote: > > > I've also sent some patches to Thomas for cfbot to help progress some of these > > > patches (code coverage and documentation upload as artifacts). > > > https://github.com/justinpryzby/cfbot/commits/master > > > > Thanks, sorry for lack of action, will get to these soon. > > On Tue, Feb 13, 2024 at 01:10:21PM -0600, Justin Pryzby wrote: > > I sent various patches to cfbot but haven't heard back. > > > https://www.postgresql.org/message-id/flat/20220409021853.GP24419@telsasoft.com > > https://www.postgresql.org/message-id/flat/20220623193125.GB22452@telsasoft.com > > https://github.com/justinpryzby/cfbot/commits/master > > https://github.com/macdice/cfbot/pulls > > @Thomas: ping > > I reintroduced the patch for ccache/windows -- v4.10 supports PCH, which > can make the builds 2x faster. I applied 0001 and 0002 to see ccache support on Windows but the build step failed with: 'ccache: error: No stats log has been configured'. Perhaps you forgot to add 'CCACHE_STATSLOG: $CCACHE_DIR.stats.log' to 0002? After adding that line, CI finished successfully. And, I confirm that the build step takes ~30 seconds now; it was ~90 seconds before that. -- Regards, Nazir Bilal Yavuz Microsoft
On Thu, Jun 13, 2024 at 02:38:46PM +0300, Nazir Bilal Yavuz wrote: > On Wed, 12 Jun 2024 at 16:10, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Fri, Jun 24, 2022 at 08:38:50AM +1200, Thomas Munro wrote: > > > > I've also sent some patches to Thomas for cfbot to help progress some of these > > > > patches (code coverage and documentation upload as artifacts). > > > > https://github.com/justinpryzby/cfbot/commits/master > > > > > > Thanks, sorry for lack of action, will get to these soon. > > > > On Tue, Feb 13, 2024 at 01:10:21PM -0600, Justin Pryzby wrote: > > > I sent various patches to cfbot but haven't heard back. > > > > > https://www.postgresql.org/message-id/flat/20220409021853.GP24419@telsasoft.com > > > https://www.postgresql.org/message-id/flat/20220623193125.GB22452@telsasoft.com > > > https://github.com/justinpryzby/cfbot/commits/master > > > https://github.com/macdice/cfbot/pulls > > > > @Thomas: ping > > > > I reintroduced the patch for ccache/windows -- v4.10 supports PCH, which > > can make the builds 2x faster. > > I applied 0001 and 0002 to see ccache support on Windows but the build > step failed with: 'ccache: error: No stats log has been configured'. > Perhaps you forgot to add 'CCACHE_STATSLOG: $CCACHE_DIR.stats.log' to > 0002? Something like that - I put the line back. I don't know if statslog should be included in the patch, but it's useful for demonstrating that it's working. ccache should be installed in the image rather than re-installed on each invocation. -- Justin
Attachment
On Thu, Jun 13, 2024 at 06:56:20AM -0500, Justin Pryzby wrote: > On Thu, Jun 13, 2024 at 02:38:46PM +0300, Nazir Bilal Yavuz wrote: >>> I reintroduced the patch for ccache/windows -- v4.10 supports PCH, which >>> can make the builds 2x faster. >> >> I applied 0001 and 0002 to see ccache support on Windows but the build >> step failed with: 'ccache: error: No stats log has been configured'. >> Perhaps you forgot to add 'CCACHE_STATSLOG: $CCACHE_DIR.stats.log' to >> 0002? > > Something like that - I put the line back. I don't know if statslog > should be included in the patch, but it's useful for demonstrating that > it's working. > > ccache should be installed in the image rather than re-installed on each > invocation. Getting a 90s -> 30s improvement is nice. With such numbers, 0002 is worth considering first. + ninja -C build |tee build.txt In 0001, how OK is it to rely on the existence of tee for the VS2019 environments? The base images include it, meaning that it is OK? - REM choco install -y --no-progress ... I'd rather keep this line in 0002, as a matter of documentation. + set CC=c:\ProgramData\chocolatey\lib\ccache\tools\ccache-4.10-windows-x86_64\ccache.exe cl.exe As of https://docs.mesa3d.org/meson.html#compiler-specification, using CC is supported by meson (didn't know that), but shouldn't this be set in the "env:" part of the VS2019 task in .cirrus.tasks.yml? -- Michael
Attachment
Hi, On Thu, 13 Jun 2024 at 14:56, Justin Pryzby <pryzby@telsasoft.com> wrote: > > ccache should be installed in the image rather than re-installed on each > invocation. ccache is installed in the Windows VM images now [1]. It can be used as 'set CC=ccache.exe cl.exe' in the Windows CI task. [1] https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e -- Regards, Nazir Bilal Yavuz Microsoft
On Fri, Jun 14, 2024 at 05:36:54PM +0300, Nazir Bilal Yavuz wrote: > Hi, > > On Thu, 13 Jun 2024 at 14:56, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > ccache should be installed in the image rather than re-installed on each > > invocation. > > ccache is installed in the Windows VM images now [1]. It can be used > as 'set CC=ccache.exe cl.exe' in the Windows CI task. > > [1] https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e Thanks. I think that works by using a "shim" created by choco in C:\ProgramData\chocolatey\bin. But going through that indirection seems to incur an extra 15sec of compilation time; I think we'll want to do something to avoid that. I'm not sure what the options are, like maybe installing ccache into a fixed path like c:\ccache or installing a custom link, to a "pinned" version of ccache. -- Justin
Hi, On June 14, 2024 8:22:01 AM PDT, Justin Pryzby <pryzby@telsasoft.com> wrote: >On Fri, Jun 14, 2024 at 05:36:54PM +0300, Nazir Bilal Yavuz wrote: >> Hi, >> >> On Thu, 13 Jun 2024 at 14:56, Justin Pryzby <pryzby@telsasoft.com> wrote: >> > >> > ccache should be installed in the image rather than re-installed on each >> > invocation. >> >> ccache is installed in the Windows VM images now [1]. It can be used >> as 'set CC=ccache.exe cl.exe' in the Windows CI task. >> >> [1] https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e > >Thanks. I think that works by using a "shim" created by choco in >C:\ProgramData\chocolatey\bin. > >But going through that indirection seems to incur an extra 15sec of >compilation time; I think we'll want to do something to avoid that. > >I'm not sure what the options are, like maybe installing ccache into a >fixed path like c:\ccache or installing a custom link, to a "pinned" >version of ccache. Hm. There actually already is the mingw ccache installed. The images for mingw and msvc used to be separate (for startupperformance when using containers), but we just merged them. So it might be easiest to just explicitly use the ccachefrom there (also an explicit path might be faster). Could you check if that's fast? If not, we can just install thebinaries distributed by the project, it's just more work to keep up2date that way. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Fri, Jun 14, 2024 at 08:34:37AM -0700, Andres Freund wrote: > Hm. There actually already is the mingw ccache installed. The images for mingw and msvc used to be separate (for startupperformance when using containers), but we just merged them. So it might be easiest to just explicitly use the ccachefrom there > (also an explicit path might be faster). I don't think the path search is significant; it's fast so long as there's no choco stub involved. > Could you check if that's fast? Yes, it is. > If not, we can just install the binaries distributed by the project, it's just more work to keep up2date that way. I guess you mean a separate line to copy choco's ccache.exe somewhere. -- Justin
Attachment
Hi, On 2024-06-18 08:36:57 -0500, Justin Pryzby wrote: > On Fri, Jun 14, 2024 at 08:34:37AM -0700, Andres Freund wrote: > > Hm. There actually already is the mingw ccache installed. The images for mingw and msvc used to be separate (for startupperformance when using containers), but we just merged them. So it might be easiest to just explicitly use the ccachefrom there > > > (also an explicit path might be faster). > > I don't think the path search is significant; it's fast so long as > there's no choco stub involved. Comparatively it's definitely small, but I've seen it make a difference on windows. > > Could you check if that's fast? > > Yes, it is. Cool. > > If not, we can just install the binaries distributed by the project, it's just more work to keep up2date that way. > > I guess you mean a separate line to copy choco's ccache.exe somewhere. I was thinking of alternatively just using the binaries upstream distributes. But the mingw way seems easier. Perhaps we should add an environment variable pointing to ccache to the image, or such? > build_script: | > vcvarsall x64 > - ninja -C build > + ninja -C build |tee build.txt > + REM Since pipes lose the exit status of the preceding command, rerun the compilation > + REM without the pipe, exiting now if it fails, to avoid trying to run checks > + ninja -C build > nul Perhaps we could do something like (ninja -C build && touch success) | tee build.txt cat success ? > + CCACHE_MAXSIZE: "500M" Does it have to be this big to work? > configure_script: | > vcvarsall x64 > - meson setup --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Dinjection_points=true-Db_pch=true -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR%-DPG_TEST_EXTRA="%PG_TEST_EXTRA%" build > + meson setup build --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Dinjection_points=true-Db_pch=true -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR%-DPG_TEST_EXTRA="%PG_TEST_EXTRA%" -Dc_args="/Z7" A comment explaining why /Z7 is necessary would be good. > From 3a399c6350ed2f341425431f184e382c3f46d981 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sat, 26 Feb 2022 19:39:10 -0600 > Subject: [PATCH 4/7] WIP: cirrus: upload changed html docs as artifacts > > This could be done on the client side (cfbot). One advantage of doing > it here is that fewer docs are uploaded - many patches won't upload docs > at all. > > https://www.postgresql.org/message-id/flat/20220409021853.GP24419@telsasoft.com > https://www.postgresql.org/message-id/CAB8KJ=i4qmEuopQ+PCSMBzGd4O-Xv0FCnC+q1x7hN9hsdvkBug@mail.gmail.com > > https://cirrus-ci.com/task/5396696388599808 I still think that this runs the risk of increasing space utilization and thus increase frequency of caches/artifacts being purged. > + # The commit that this branch is rebased on. There's no easy way to find this. I don't think that's true, IIRC I've complained about it before. We can do something like: major_num=$(grep PG_MAJORVERSION_NUM build/src/include/pg_config.h|awk '{print $3}'); echo major: $major_num; if git rev-parse --quiet --verify origin/REL_${major_num}_STABLE > /dev/null ; then basebranch=origin/REL_${major_num}_STABLE; else basebranch=origin/master; fi; echo base branch: $basebranch base_commit=$(git merge-base HEAD $basebranch) Greetings, Andres Freund
On Tue, Jun 18, 2024 at 02:25:46PM -0700, Andres Freund wrote: > > > If not, we can just install the binaries distributed by the project, it's just more work to keep up2date that way. > > > > I guess you mean a separate line to copy choco's ccache.exe somewhere. > > I was thinking of alternatively just using the binaries upstream > distributes. But the mingw way seems easier. > > Perhaps we should add an environment variable pointing to ccache to the image, > or such? I guess you mean changing the OS image so there's a $CCACHE. That sounds fine... > > + CCACHE_MAXSIZE: "500M" > > Does it have to be this big to work? It's using 150MB for an initial compilation, and maxsize will need to be 20% larger for it to not evict its cache before it can be used. The other ccaches (except for mingw) seem to be several times larger than what's needed for a single compilation, which makes sense to cache across multiple branches (or even commits in a single branch), and for cfbot. > A comment explaining why /Z7 is necessary would be good. Sure > > build_script: | > > vcvarsall x64 > > - ninja -C build > > + ninja -C build |tee build.txt > > + REM Since pipes lose the exit status of the preceding command, rerun the compilation > > + REM without the pipe, exiting now if it fails, to avoid trying to run checks > > + ninja -C build > nul > > Perhaps we could do something like > (ninja -C build && touch success) | tee build.txt > cat success > ? I don't know -- a pipe alone seems more direct than a subshell+conditional+pipe written in cmd.exe, whose syntax is not well known here. Maybe you're suggesting to write sh -c "(ninja -C build && touch success) | tee build.txt ; cat ./success" But that's another layer of complexity .. for what ? > > Subject: [PATCH 4/7] WIP: cirrus: upload changed html docs as artifacts > > I still think that this runs the risk of increasing space utilization and thus > increase frequency of caches/artifacts being purged. Maybe it should run on the local macs where I think you can control that. > > + # The commit that this branch is rebased on. There's no easy way to find this. > > I don't think that's true, IIRC I've complained about it before. We can do > something like: cfbot now exposes it, so it'd be trivial for that case (although there was no response here to my inquiries about that). I'll revisit this in the future, once other patches have progressed. > major_num=$(grep PG_MAJORVERSION_NUM build/src/include/pg_config.h|awk '{print $3}'); > echo major: $major_num; > if git rev-parse --quiet --verify origin/REL_${major_num}_STABLE > /dev/null ; then > basebranch=origin/REL_${major_num}_STABLE; > else > basebranch=origin/master; > fi; > echo base branch: $basebranch > base_commit=$(git merge-base HEAD $basebranch)
Attachment
Hi, On 2024-08-06 14:10:15 -0500, Justin Pryzby wrote: > > > + # The commit that this branch is rebased on. There's no easy way to find this. > > > > I don't think that's true, IIRC I've complained about it before. We can do > > something like: > > cfbot now exposes it, so it'd be trivial for that case (although there > was no response here to my inquiries about that). I'll revisit this in > the future, once other patches have progressed. What do you mean with "cfbot now exposes that"? Greetings, Andres Freund
On Sat, Nov 16, 2024 at 03:22:57PM -0500, Andres Freund wrote: > Hi, > > On 2024-08-06 14:10:15 -0500, Justin Pryzby wrote: > > > > + # The commit that this branch is rebased on. There's no easy way to find this. > > > > > > I don't think that's true, IIRC I've complained about it before. We can do > > > something like: > > > > cfbot now exposes it, so it'd be trivial for that case (although there > > was no response here to my inquiries about that). I'll revisit this in > > the future, once other patches have progressed. > > What do you mean with "cfbot now exposes that"? cfbot now makes one single merge commit, indicating the patch on which the branch is based. https://github.com/macdice/cfbot/pull/18 Note that my own cfbot branch evidently never got looked at since 2.5 years ago, which has been a discouraging factor for me. 20220623192343.GA22452@telsasoft.com