Thread: Re: CI and test improvements

Re: CI and test improvements

From

Justin Pryzby

Date:

23 June 2022, 19:31:26

On Sat, May 28, 2022 at 10:37:41AM -0500, Justin Pryzby wrote:
> I'm "joining" a bunch of unresolved threads hoping to present them better since
> they're related and I'm maintaining them together anyway.

This resolves an error with libpq tests in an intermediate commit, probably
caused by rebasing (and maybe hidden by the fact that the tests weren't being
run).

And updates ccache to avoid CCACHE_COMPILER.

Should any of the test changes go into v15 ?

> Subject: [PATCH 02/19] cirrus/vcregress: test modules/contrib with NO_INSTALLCHECK=1
> Subject: [PATCH 08/19] vcregress: add alltaptests
> Subject: [PATCH 14/19] Move libpq_pipeline test into src/interfaces/libpq.
> Subject: [PATCH 15/19] msvc: do not install libpq test tools by default

I also added: cirrus/ccache: disable compression and show stats

Since v4.0, ccache enables zstd compression by default, saving roughly 2x-3x.
But, cirrus caches are compressed as tar.gz, so we could disable ccache
compression, allowing cirrus to gzip the uncompressed data (better than
ccache's default of zstd-1).

https://cirrus-ci.com/build/5194596123672576
debian/bullseye has ccache 4.2; cirrus says 92MB cache after a single compilation; cache_size_kibibyte 102000
macos: has 4.5.1: 46MB cache; cache_size_kibibyte        51252
freebsd: has 3.7.12: 41MB cache; cache_size_kibibyte     130112

For some reason, mac's ccache uses 2x less space than either freesbsd or linux.
Linux is ~30% larger.
Freebsd uploads an artifact 3x smaller than the size ccache reports, because
its ccache is old so doesn't enable compression by default.

I've also sent some patches to Thomas for cfbot to help progress some of these
patches (code coverage and documentation upload as artifacts).
https://github.com/justinpryzby/cfbot/commits/master

-- 
Justin

Attachment

Re: CI and test improvements

From

Thomas Munro

Date:

23 June 2022, 20:38:50

[Resending -- sorry if you receive this twice.  Jacob's mail server
has apparently offended the list management software so emails bounce
if he's in CC.]

On Fri, Jun 24, 2022 at 7:23 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Freebsd uploads an artifact 3x smaller than the size ccache reports, because
> its ccache is old so doesn't enable compression by default.

That port/package got stuck on 3.x because of some dependency problems
when using in bootstrapping FreeBSD itself (or other ports?),
apparently.  I didn't follow the details but the recent messages here
sound hopeful and I'm keeping my eye on it to see if 4.x lands as a
separate package we'd need to install, or something.  Fingers crossed!

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234971

> I've also sent some patches to Thomas for cfbot to help progress some of these
> patches (code coverage and documentation upload as artifacts).
> https://github.com/justinpryzby/cfbot/commits/master

Thanks, sorry for lack of action, will get to these soon.

Re: CI and test improvements

From

Justin Pryzby

Date:

28 August 2022, 14:44:47

Resending with a problematic address removed...

On Sat, May 28, 2022 at 10:37:41AM -0500, Justin Pryzby wrote:
> I'm anticipating the need to further re-arrange the patch set - it's not clear
> which patches should go first.  Maybe some patches should be dropped in favour
> of the meson project.  I guess some patches will have to be re-implemented with
> meson (msvc warnings).

> Maybe the improvements to vcregress should go into v15 ?  CI should run all the
> tests (which also serves to document *how* to run all the tests, even if there
> isn't a literal check-world target).

On Thu, Jun 23, 2022 at 02:31:25PM -0500, Justin Pryzby wrote:
> Should any of the test changes go into v15 ?

On Thu, Jul 07, 2022 at 07:22:32PM -0500, Justin Pryzby wrote:
> Also, cirrus/freebsd task can run 3x faster with more CPUs.

Checking if there's interest in any/none of these patches ?
I have added several more.

Do you have an idea when the meson branch might be merged ?

Will vcregress remain for a while, or will it go away for v16 ?

-- 
Justin

Attachment

Re: CI and test improvements

From

Andres Freund

Date:

28 August 2022, 16:07:52

Hi,

On 2022-08-28 09:44:47 -0500, Justin Pryzby wrote:
> On Sat, May 28, 2022 at 10:37:41AM -0500, Justin Pryzby wrote:
> > I'm anticipating the need to further re-arrange the patch set - it's not clear
> > which patches should go first.  Maybe some patches should be dropped in favour
> > of the meson project.  I guess some patches will have to be re-implemented with
> > meson (msvc warnings).
>
> > Maybe the improvements to vcregress should go into v15 ?  CI should run all the
> > tests (which also serves to document *how* to run all the tests, even if there
> > isn't a literal check-world target).
>
> On Thu, Jun 23, 2022 at 02:31:25PM -0500, Justin Pryzby wrote:
> > Should any of the test changes go into v15 ?
>
> On Thu, Jul 07, 2022 at 07:22:32PM -0500, Justin Pryzby wrote:
> > Also, cirrus/freebsd task can run 3x faster with more CPUs.
>
> Checking if there's interest in any/none of these patches ?
> I have added several more.
>
> Do you have an idea when the meson branch might be merged ?

I hope to do that fairly soon, but it's of course dependant on review etc. The
plan was to merge it early and mature it in tree to some degree. There's only
so much we can do "from the outside"...


> Will vcregress remain for a while, or will it go away for v16 ?

The plan was for the windows stuff to go away fairly quickly.


> From 99ee0bef5054539aad0e23a24dd9c9cc9ccee788 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Wed, 25 May 2022 21:53:22 -0500
> Subject: [PATCH 01/25] cirrus/windows: add compiler_warnings_script

Looks good.

> -    MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false
-nologo
> +    # -fileLoggerParameters1: write warnings to msbuild.warn.log.
> +    MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo
-fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log

Except, I think it'd be good to split this line. What do you think about using
something like
MSBFLAGS: >-
  -nologo
  -m -verbosity:minimal
  /p:TrackFileAccess=false
  "-consoleLoggerParameters:Summary;ForceNoAlign"
  -fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log

I think that should work?


>      # If tests hang forever, cirrus eventually times out. In that case log
>      # output etc is not uploaded, making the problem hard to debug. Of course
> @@ -450,6 +451,11 @@ task:
>      cd src/tools/msvc
>      %T_C% perl vcregress.pl ecpgcheck
>
> +  # These should be last, so all the important checks are always run
> +  always:
> +    compiler_warnings_script:
> +      - sh src\tools\ci\windows-compiler-warnings msbuild.warn.log
> +
>    on_failure:
>      <<: *on_failure
>      crashlog_artifacts:
> diff --git a/src/tools/ci/windows-compiler-warnings b/src/tools/ci/windows-compiler-warnings
> new file mode 100755
> index 00000000000..d6f9a1fc569
> --- /dev/null
> +++ b/src/tools/ci/windows-compiler-warnings
> @@ -0,0 +1,16 @@
> +#! /bin/sh
> +# Success if the given file doesn't exist or is empty, else fail
> +# This is a separate file only to avoid dealing with windows shell quoting and escaping.
> +set -e
> +
> +fn=$1
> +
> +if [ -s "$fn" ]
> +then
> +    # Display the file's content, then exit indicating failure
> +    cat "$fn"
> +    exit 1
> +else
> +    # Success
> +    exit 0
> +fi
> --
> 2.17.1

Wouldn't that be doable as something like
sh -c 'if test -s file; then cat file;exit 1; fi"
inside .cirrus.yml?



> From 1064a0794e85e06b3a0eca4ed3765f078795cb36 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 3 Apr 2022 00:10:20 -0500
> Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats
>
> ccache since 4.0 enables zstd compression by default.
>
> With default compression enabled (https://cirrus-ci.com/task/6692342840164352):
> linux has 4.2; 99MB cirrus cache; cache_size_kibibyte    109616
> macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte    52500
> freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte    134064
> windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte    51179
> todo: compiler warnings
>
> With compression disabled (https://cirrus-ci.com/task/4614182514458624):
> linux: 91MB cirrus cache; cache_size_kibibyte    316136
> macos: 41MB cirrus cache; cache_size_kibibyte    118068
> windows: 42MB cirrus cache; cache_size_kibibyte    134064
> freebsd is the same
>
> The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS,
> to avoid re-uploading cache tarball in a 100% cached build, due only to
> updating ./**/stats).
>
> Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.

I stared at this commit message for a while, trying to make sense of it, and
couldn't really. I assume you're saying that the cirrus compression is better
with ccache compression disabled, but it's extremely hard to parse out of it.

This does too much at once. Show stats, change cache sizes, disable
compression.



> From 01e9abd386a4e6cc0125b97617fb42e695898cbf Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Tue, 26 Jul 2022 20:30:02 -0500
> Subject: [PATCH 04/25] cirrus/ccache: add explicit cache keys..
>
> Since otherwise, building with ci-os-only will probably fail to use the normal
> cache, since the cache key is computed using both the task name and its *index*
> in the list of caches (internal/executor/cache.go:184).

Hm, perhaps worth confirming and/or reporting to cirrus rather?



> From 8de5c977686270b0a4e666a924ebe820a245913a Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 24 Jul 2022 23:09:12 -0500
> Subject: [PATCH 05/25] silence make distprep and generated-headers
>
> this saves vertical screen space.
>
> https://www.postgresql.org/message-id/20220221164736.rq3ornzjdkmwk2wo@alap3.anarazel.de

I don't feel this should go in as a part of CI changes. Or rather, I feel
uncomfortable committing it when just discussed under this subject.



> From eaf263ccaa8310c5d9834b97e93ad8434d63296e Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 24 Jul 2022 22:44:53 -0500
> Subject: [PATCH 06/25] pg_regress: run more quietly
>
> The number of lines of output should be closer to 1 per test, rather than 25 +
> 1 per test.
>
> https://www.postgresql.org/message-id/flat/20220221173109.yl6dqqu3ud52ripd%40alap3.anarazel.de

See above. There's also a dedicated thread about revising the output.


> From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Fri, 24 Jun 2022 00:09:12 -0500
> Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not
>  repartitiion
>
> There was some historic problem where tests under freebsd took 8+ minutes (and
> before 4a288a37f took 15 minutes).
>
> This reduces test time from 10min to 3min.
> 4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720
> 4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008
> 4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600
>
> 6 CPUs https://cirrus-ci.com/task/6678321684545536
> 8 CPUs https://cirrus-ci.com/task/6264854121021440
>
> See also:
>
https://www.postgresql.org/message-id/flat/20220310033347.hgxk4pyarzq4hxwp@alap3.anarazel.de#f36c0b17e33e31e7925e7e5812998686
> 8 jobs 7min https://cirrus-ci.com/task/6186376667332608
>
> xi-os-only: freebsd

Typo.


> @@ -71,8 +69,6 @@ task:
>      fingerprint_key: ccache/freebsd
>      reupload_on_changes: true
>
> -  # Workaround around performance issues due to 32KB block size
> -  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
>    create_user_script: |
>      pw useradd postgres
>      chown -R postgres:postgres .
> --

What's the story there - at some point that was important for performance
because of the native block size triggering significant read-modify-write
cycles with postres' writes. You didn't comment on it in the commit message.


> From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Wed, 27 Jul 2022 16:54:47 -0500
> Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath

> From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 13 Feb 2022 17:56:40 -0600
> Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min

No explanation?


> From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Fri, 26 Aug 2022 12:00:10 -0500
> Subject: [PATCH 15/25] f!and chdir

I don't see the point of pointing fixup commits to the list.

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

28 August 2022, 17:10:29

On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote:
> > --- /dev/null
> > +++ b/src/tools/ci/windows-compiler-warnings
> 
> Wouldn't that be doable as something like
> sh -c 'if test -s file; then cat file;exit 1; fi"
> inside .cirrus.yml?

I had written it inline in a couple ways, like
- sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi'

but then separated it out as you suggested in
20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de

after I complained about cmd.exe requiring escaping for && and ||
That makes writing any shell script a bit perilous and a separate script
seems better.

> > Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats
> >
> > ccache since 4.0 enables zstd compression by default.
> >
> > With default compression enabled (https://cirrus-ci.com/task/6692342840164352):
> > linux has 4.2; 99MB cirrus cache; cache_size_kibibyte    109616
> > macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte    52500
> > freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte    134064
> > windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte    51179
> > todo: compiler warnings
> >
> > With compression disabled (https://cirrus-ci.com/task/4614182514458624):
> > linux: 91MB cirrus cache; cache_size_kibibyte    316136
> > macos: 41MB cirrus cache; cache_size_kibibyte    118068
> > windows: 42MB cirrus cache; cache_size_kibibyte    134064
> > freebsd is the same
> >
> > The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS,
> > to avoid re-uploading cache tarball in a 100% cached build, due only to
> > updating ./**/stats).
> >
> > Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.
> 
> I stared at this commit message for a while, trying to make sense of it, and
> couldn't really. I assume you're saying that the cirrus compression is better
> with ccache compression disabled, but it's extremely hard to parse out of it.

Yes, because ccache uses zstd-1, and cirrus uses gzip, which it's going
to use no matter what ccache does, and gzip's default -6 is better than
ccache's zstd-1.

> This does too much at once. Show stats, change cache sizes, disable
> compression.

The cache size change is related to the compression level change; ccache
prunes based on the local size, which was compressed with zstd-1 and,
with this patch, not compressed (so ~2x larger).  Also, it's more
interesting to control the size uploaded to cirrus (after compression
ith gzip-6).

> > From 01e9abd386a4e6cc0125b97617fb42e695898cbf Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > Date: Tue, 26 Jul 2022 20:30:02 -0500
> > Subject: [PATCH 04/25] cirrus/ccache: add explicit cache keys..
> >
> > Since otherwise, building with ci-os-only will probably fail to use the normal
> > cache, since the cache key is computed using both the task name and its *index*
> > in the list of caches (internal/executor/cache.go:184).
> 
> Hm, perhaps worth confirming and/or reporting to cirrus rather?

I know because of reading their source.  Unfortunately, there's no
commit history indicating the intent or rationale.
https://github.com/cirruslabs/cirrus-ci-agent/blob/master/internal/executor/cache.go#L183

> > From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > Date: Fri, 24 Jun 2022 00:09:12 -0500
> > Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not
> >  repartitiion
> >
> > There was some historic problem where tests under freebsd took 8+ minutes (and
> > before 4a288a37f took 15 minutes).
> >
> > This reduces test time from 10min to 3min.
> > 4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720
> > 4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008
> > 4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600
> >
> > 6 CPUs https://cirrus-ci.com/task/6678321684545536
> > 8 CPUs https://cirrus-ci.com/task/6264854121021440
> >
> > See also:
> >
https://www.postgresql.org/message-id/flat/20220310033347.hgxk4pyarzq4hxwp@alap3.anarazel.de#f36c0b17e33e31e7925e7e5812998686
> > 8 jobs 7min https://cirrus-ci.com/task/6186376667332608
> >
> > xi-os-only: freebsd
> 
> Typo.

No - it's deliberate so I can switch to and from "everything" to "this
only".

> > @@ -71,8 +69,6 @@ task:
> >      fingerprint_key: ccache/freebsd
> >      reupload_on_changes: true
> >
> > -  # Workaround around performance issues due to 32KB block size
> > -  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
> >    create_user_script: |
> >      pw useradd postgres
> >      chown -R postgres:postgres .
> > --
> 
> What's the story there - at some point that was important for performance
> because of the native block size triggering significant read-modify-write
> cycles with postres' writes. You didn't comment on it in the commit message.

Well, I don't know the history, but it seems to be unneeded now.

Is there a good description of the original problem ?  Originally,
freebsd check-world took ~15min to run tests, and when we changed to use
-Og it took 10min.  Since then, seems to have improved on its own, and
currently takes ~6min.  This patch adds CPUs to make it run in ~4min,
and takes the opportuity to drop the historic repartition stuff.

> > From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > Date: Wed, 27 Jul 2022 16:54:47 -0500
> > Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath
> 
> > From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > Date: Sun, 13 Feb 2022 17:56:40 -0600
> > Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min
> 
> No explanation?

Because of the immediately following commit which makes it run all the
tests.

> > From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > Date: Fri, 26 Aug 2022 12:00:10 -0500
> > Subject: [PATCH 15/25] f!and chdir
> 
> I don't see the point of pointing fixup commits to the list.

It's a separate commit to make it easy to see the changes, separately,
since I imagine maybe the "chdir" part won't be desirable, or maybe the
PATH part won't.  But I'm not sure, so I'm here soliciting feedback.

-- 
Justin

Re: CI and test improvements

From

Andres Freund

Date:

28 August 2022, 21:28:02

Hi,

On 2022-08-28 12:10:29 -0500, Justin Pryzby wrote:
> On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote:
> > > --- /dev/null
> > > +++ b/src/tools/ci/windows-compiler-warnings
> >
> > Wouldn't that be doable as something like
> > sh -c 'if test -s file; then cat file;exit 1; fi"
> > inside .cirrus.yml?
>
> I had written it inline in a couple ways, like
> - sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi'
>
> but then separated it out as you suggested in
> 20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de
>
> after I complained about cmd.exe requiring escaping for && and ||
> That makes writing any shell script a bit perilous and a separate script
> seems better.

I remember that I suggested it - but note that the way I wrote above doesn't
have anything needing escaping. Anyway, what do you think of the multiline
split I suggested?


> > > Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats
> > >
> > > ccache since 4.0 enables zstd compression by default.
> > >
> > > With default compression enabled (https://cirrus-ci.com/task/6692342840164352):
> > > linux has 4.2; 99MB cirrus cache; cache_size_kibibyte    109616
> > > macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte    52500
> > > freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte    134064
> > > windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte    51179
> > > todo: compiler warnings
> > >
> > > With compression disabled (https://cirrus-ci.com/task/4614182514458624):
> > > linux: 91MB cirrus cache; cache_size_kibibyte    316136
> > > macos: 41MB cirrus cache; cache_size_kibibyte    118068
> > > windows: 42MB cirrus cache; cache_size_kibibyte    134064
> > > freebsd is the same
> > >
> > > The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS,
> > > to avoid re-uploading cache tarball in a 100% cached build, due only to
> > > updating ./**/stats).
> > >
> > > Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.
> >
> > I stared at this commit message for a while, trying to make sense of it, and
> > couldn't really. I assume you're saying that the cirrus compression is better
> > with ccache compression disabled, but it's extremely hard to parse out of it.
>
> Yes, because ccache uses zstd-1, and cirrus uses gzip, which it's going
> to use no matter what ccache does, and gzip's default -6 is better than
> ccache's zstd-1.
>
> > This does too much at once. Show stats, change cache sizes, disable
> > compression.
>
> The cache size change is related to the compression level change; ccache
> prunes based on the local size, which was compressed with zstd-1 and,
> with this patch, not compressed (so ~2x larger).  Also, it's more
> interesting to control the size uploaded to cirrus (after compression
> ith gzip-6).

That's what should have been in the commit message.


> > > From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001
> > > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > > Date: Fri, 24 Jun 2022 00:09:12 -0500
> > > Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not
> > >  repartitiion
> > >
> > > There was some historic problem where tests under freebsd took 8+ minutes (and
> > > before 4a288a37f took 15 minutes).
> > >
> > > This reduces test time from 10min to 3min.
> > > 4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720
> > > 4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008
> > > 4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600
> > >
> > > 6 CPUs https://cirrus-ci.com/task/6678321684545536
> > > 8 CPUs https://cirrus-ci.com/task/6264854121021440
> > >
> > > See also:
> > >
https://www.postgresql.org/message-id/flat/20220310033347.hgxk4pyarzq4hxwp@alap3.anarazel.de#f36c0b17e33e31e7925e7e5812998686
> > > 8 jobs 7min https://cirrus-ci.com/task/6186376667332608
> > >
> > > xi-os-only: freebsd
> >
> > Typo.
>
> No - it's deliberate so I can switch to and from "everything" to "this
> only".

I don't see the point in posting patches to be applied if they contain lots of
such things that a potential committer would need to catch and include a lot
of of fixup patches.


> > > @@ -71,8 +69,6 @@ task:
> > >      fingerprint_key: ccache/freebsd
> > >      reupload_on_changes: true
> > >
> > > -  # Workaround around performance issues due to 32KB block size
> > > -  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
> > >    create_user_script: |
> > >      pw useradd postgres
> > >      chown -R postgres:postgres .
> > > --
> >
> > What's the story there - at some point that was important for performance
> > because of the native block size triggering significant read-modify-write
> > cycles with postres' writes. You didn't comment on it in the commit message.
>
> Well, I don't know the history, but it seems to be unneeded now.

It's possible it was mainly needed for testing with aio + dio. But also
possible that an upgrade improved the situation since.


> > > From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001
> > > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > > Date: Wed, 27 Jul 2022 16:54:47 -0500
> > > Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath
> >
> > > From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001
> > > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > > Date: Sun, 13 Feb 2022 17:56:40 -0600
> > > Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min
> >
> > No explanation?
>
> Because of the immediately following commit which makes it run all the
> tests.

Mention that in the commit message then. Especially when dealing with 25
commits I don't think you can expect others to infer such things.


> > > From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001
> > > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > > Date: Fri, 26 Aug 2022 12:00:10 -0500
> > > Subject: [PATCH 15/25] f!and chdir
> >
> > I don't see the point of pointing fixup commits to the list.
>
> It's a separate commit to make it easy to see the changes, separately,
> since I imagine maybe the "chdir" part won't be desirable, or maybe the
> PATH part won't.  But I'm not sure, so I'm here soliciting feedback.

Shrug, I doubt you'll get much if asked that way.

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

10 September 2022, 20:05:42

On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:
> On 2022-08-28 12:10:29 -0500, Justin Pryzby wrote:
> > On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote:
> > > > --- /dev/null
> > > > +++ b/src/tools/ci/windows-compiler-warnings
> > >
> > > Wouldn't that be doable as something like
> > > sh -c 'if test -s file; then cat file;exit 1; fi"
> > > inside .cirrus.yml?
> >
> > I had written it inline in a couple ways, like
> > - sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi'
> >
> > but then separated it out as you suggested in
> > 20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de
> >
> > after I complained about cmd.exe requiring escaping for && and ||
> > That makes writing any shell script a bit perilous and a separate script
> > seems better.
> 
> I remember that I suggested it - but note that the way I wrote above doesn't
> have anything needing escaping. 

It doesn't require it, but that still gives the impression that it's
normally possible to write one-liner shell scripts there, which is
misleading/wrong, and the reason why I took your suggestion to use a
separate script file.

> Anyway, what do you think of the multiline split I suggested?

Done, and sorted.

> That's what should have been in the commit message.

Sure.  I copied into the commit message the explanation that I had
written in June's email.

> > > > xi-os-only: freebsd
> > >
> > > Typo.
> >
> > No - it's deliberate so I can switch to and from "everything" to "this
> > only".
> 
> I don't see the point in posting patches to be applied if they contain lots of
> such things that a potential committer would need to catch and include a lot
> of of fixup patches.

I get that you disliked that I disabled the effect of a CI tag by
munging "c" to "x".  I've amended the message to avoid confusion.  But,
lots of what such things ?  "ci-os-only" would be removed before being
pushed anyway.

"catching things" is the first part of the review process, which (as I
understand) is intended to help patch authors to improve their patches.
If you found lots of problems in my patches, I'd need to know about
them; but most of what I heard seem like quibbles about the presentation
of the patches.  It's true that some parts are dirty/unclear, and that
seems reasonable for patches most of which haven't yet received review,
for which I asked whether to pursue the patch at all, and how best to
present them.  This is (or could be) an opportunity to make
improvements.

I renamed the two, related patches to Cluser.pm which said "f!", which
are deliberately separate but looked like "fixup" patches.  Are you
interested in any combination of those three, related changes to move
logic from Makefile to perl ?  If not, we don't need to debate the
merits of spliting the patch.

What about the three, related changes for ccache compression ?

Should these be dropped in favour of meson ?
 - cirrus/vcregress: test modules/contrib with NO_INSTALLCHECK=1
 - vcregress: add alltaptests

I added: "WIP: skip building if only docs have changed"

changesInclude() didn't seem to work right when I first tried to use it.
Eventually, I realized that it seems to use something like "git log",
and not "git diff" (as I'd thought).  It seems to work fine now that I
know what to expect.

git commit --amend --no-edit
git diff --stat @{1}..@{0} # this outputs nothing
git log  --stat @{1}..@{0} # this lists the files changed by the tip commit

It'd be nice to be have cfbot inject this patch into each commitfest
patch for awhile, to make sure everything works as expected.  Same for
the code coverage patch and the doc artifacts patch.  (These patches
currently assume that the base commit is HEAD~1, which is correct for
cfbot, and that would also provide code coverage and docs until such
time as cfbot is updated to apply and preserve the original series of
patches).

-- 
Justin

Attachment

Re: CI and test improvements

From

Justin Pryzby

Date:

22 September 2022, 21:07:02

On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:
> > > > @@ -71,8 +69,6 @@ task:
> > > >      fingerprint_key: ccache/freebsd
> > > >      reupload_on_changes: true
> > > >
> > > > -  # Workaround around performance issues due to 32KB block size
> > > > -  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
> > > >    create_user_script: |
> > > >      pw useradd postgres
> > > >      chown -R postgres:postgres .
> > > > --
> > >
> > > What's the story there - at some point that was important for performance
> > > because of the native block size triggering significant read-modify-write
> > > cycles with postres' writes. You didn't comment on it in the commit message.
> >
> > Well, I don't know the history, but it seems to be unneeded now.
> 
> It's possible it was mainly needed for testing with aio + dio. But also
> possible that an upgrade improved the situation since.

Maybe freebsd got faster as a result of the TAU CPUs?
https://mobile.twitter.com/cirrus_labs/status/1534982111568052240

I noticed because it's been *slower* the last ~24h since cirrusci
disabled TAU, as Thomas commit mentioned.
https://twitter.com/cirrus_labs/status/1572657320093712384

For example this CF entry:

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3736
https://cirrus-ci.com/task/4670794365140992 5m36s - 4days ago
https://cirrus-ci.com/task/4974926233862144 5m25s - 3days ago
https://cirrus-ci.com/task/5561409034518528 5m29s - 2days ago
https://cirrus-ci.com/task/6432442008469504 9m19s - yesterday

CF_BOT's latest tasks seem to be fast again, since 1-2h ago.
https://cirrus-ci.com/build/5178906041909248 9m1s
https://cirrus-ci.com/build/4593160281128960 5m8s
https://cirrus-ci.com/build/4539845644124160 5m22s

The logs for July show when freebsd started "being fast":
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3708
https://cirrus-ci.com/task/6316073015312384 10m25s Jul 13
https://cirrus-ci.com/task/5662878987452416 5m48s Jul 15

Maybe that changed in July rather than June because the TAU CPUs were
still not available in every region/zone (?)
https://cloud.google.com/compute/docs/regions-zones/

I have no idea if the TAU CPUs eliminate/mitigate the original
performance issue you had with AIO.  But they have such a large effect
on freebsd that it could now be the fastest task, if given more than 2
CPUs.

-- 
Justin

Re: CI and test improvements

From

Andres Freund

Date:

22 September 2022, 21:53:23

Hi,

On 2022-09-22 16:07:02 -0500, Justin Pryzby wrote:
> On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:
> > > > > @@ -71,8 +69,6 @@ task:
> > > > >      fingerprint_key: ccache/freebsd
> > > > >      reupload_on_changes: true
> > > > >
> > > > > -  # Workaround around performance issues due to 32KB block size
> > > > > -  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
> > > > >    create_user_script: |
> > > > >      pw useradd postgres
> > > > >      chown -R postgres:postgres .
> > > > > --
> > > >
> > > > What's the story there - at some point that was important for performance
> > > > because of the native block size triggering significant read-modify-write
> > > > cycles with postres' writes. You didn't comment on it in the commit message.
> > >
> > > Well, I don't know the history, but it seems to be unneeded now.
> > 
> > It's possible it was mainly needed for testing with aio + dio. But also
> > possible that an upgrade improved the situation since.
> 
> Maybe freebsd got faster as a result of the TAU CPUs?
> https://mobile.twitter.com/cirrus_labs/status/1534982111568052240
> 
> I noticed because it's been *slower* the last ~24h since cirrusci
> disabled TAU, as Thomas commit mentioned.
> https://twitter.com/cirrus_labs/status/1572657320093712384

Yea, I noticed that as well. It's entirely possible that something in the
"hardware" stack improved sufficiently to avoid problems.


> I have no idea if the TAU CPUs eliminate/mitigate the original
> performance issue you had with AIO.  But they have such a large effect
> on freebsd that it could now be the fastest task, if given more than 2
> CPUs.

I'm planning to rebase early next week and try that out.

Greetings,

Andres Freund

Re: CI and test improvements

From

Thomas Munro

Date:

23 September 2022, 03:32:48

On Fri, Sep 23, 2022 at 9:07 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:
> > > > > -  # Workaround around performance issues due to 32KB block size
> > > > > -  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
> > > > >    create_user_script: |
> > > > >      pw useradd postgres
> > > > >      chown -R postgres:postgres .
> > > >
> > > > What's the story there - at some point that was important for performance
> > > > because of the native block size triggering significant read-modify-write
> > > > cycles with postres' writes. You didn't comment on it in the commit message.
> > >
> > > Well, I don't know the history, but it seems to be unneeded now.
> >
> > It's possible it was mainly needed for testing with aio + dio. But also
> > possible that an upgrade improved the situation since.

Yeah, it is very important for direct I/O (patches soon...), because
every 8KB random write becomes a read-32KB/wait/write-32KB without it.
It's slightly less important for buffered I/O, because the kernel
caches hide that, but it still triggers I/O bandwidth amplification,
and we definitely saw positive effects earlier on the CI system back
on the previous generation.  FWIW I am planning to see about getting
the FreeBSD installer to create the root file system the way we want,
instead of this ugliness.

> Maybe freebsd got faster as a result of the TAU CPUs?
> [data]

Very interesting.  Would be good to find the exact instance types +
storage types to see if there has been a massive IOPS boost, perhaps
via local SSD.  The newer times are getting closer to a local
developer machine.

Re: CI and test improvements

From

Andres Freund

Date:

02 October 2022, 00:45:01

Hi,

On 2022-09-10 15:05:42 -0500, Justin Pryzby wrote:
> From 4ed5eb427de4508a4c3422e60891b45c8512814a Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 3 Apr 2022 00:10:20 -0500
> Subject: [PATCH 03/23] cirrus/ccache: disable compression and show stats
> 
> Since v4.0, ccache enables zstd compression by default, saving roughly
> 2x-3x.  But, cirrus caches are compressed as tar.gz, so we could disable
> ccache compression, allowing cirrus to gzip the uncompressed data
> (better than ccache's default of zstd-1).

I wonder whether we could instead change CCACHE_COMPRESSLEVEL (maybe 3, zstd's
default IIRC). It'd be good if we could increase cache utilization.


> From 0bd5f51b8c143ed87a867987309d66b8554b1fd6 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Thu, 14 Apr 2022 06:27:07 -0500
> Subject: [PATCH 05/23] cirrus: enable various runtime checks on macos and
>  freebsd
> 
> windows is slower than freebsd and mac, so it's okay to enable options which
> will slow them down some.  Also, the cirrusci mac instances always have lot of
> cores available.

> See:
> https://www.postgresql.org/message-id/20211217193159.pwrelhiyx7kevgsn@alap3.anarazel.de
> https://www.postgresql.org/message-id/20211213211223.vkgg3wwiss2tragj%40alap3.anarazel.de
> https://www.postgresql.org/message-id/CAH2-WzmevBhKNEtqX3N-Tkb0gVBHH62C0KfeTxXzqYES_PiFiA%40mail.gmail.com
> https://www.postgresql.org/message-id/20220325000933.vgazz7pjk2ytj65d@alap3.anarazel.de
> 
> ci-os-only: freebsd, macos
> ---
>  .cirrus.yml | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> index 183e8746ce6..4ad20892eeb 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -113,7 +113,9 @@ task:
>          \
>          CC="ccache cc" \
>          CXX="ccache c++" \
> -        CFLAGS="-Og -ggdb"
> +        CPPFLAGS="-DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST"\
 
> +        CXXFLAGS="-Og -ggdb -march=native -mtune=native" \
> +        CFLAGS="-Og -ggdb -march=native -mtune=native"

What's reason for -march=native -mtune=native here?


>      EOF
>    build_script: |
>      su postgres -c "ccache --zero-stats"
> @@ -336,8 +338,8 @@ task:
>        CC="ccache cc" \
>        CXX="ccache c++" \
>        CLANG="ccache ${brewpath}/llvm/bin/ccache" \
> -      CFLAGS="-Og -ggdb" \
> -      CXXFLAGS="-Og -ggdb" \
> +      CFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \
> +      CXXFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \
>        \
>        LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \
>        PYTHON=python3

I'd also use CPPFLAGS here, given you'd used it above...

I'm planning to commit an updated version of this change soon, without the
-march=native -mtune=native bit, unless somebody protests...

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

02 October 2022, 00:58:01

On Sat, Oct 01, 2022 at 05:45:01PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-09-10 15:05:42 -0500, Justin Pryzby wrote:
> > From 4ed5eb427de4508a4c3422e60891b45c8512814a Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > Date: Sun, 3 Apr 2022 00:10:20 -0500
> > Subject: [PATCH 03/23] cirrus/ccache: disable compression and show stats
> > 
> > Since v4.0, ccache enables zstd compression by default, saving roughly
> > 2x-3x.  But, cirrus caches are compressed as tar.gz, so we could disable
> > ccache compression, allowing cirrus to gzip the uncompressed data
> > (better than ccache's default of zstd-1).
> 
> I wonder whether we could instead change CCACHE_COMPRESSLEVEL (maybe 3, zstd's
> default IIRC). It'd be good if we could increase cache utilization.

I considered that (and I think that's what I wrote initially).

I figured that if cirrus is going to use gzip-6 (tar.gz) in any case, we
might as well disable compression.  Then, all the tasks are also doing
the same thing (half the tasks have ccache before 4.0).

> > From 0bd5f51b8c143ed87a867987309d66b8554b1fd6 Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > Date: Thu, 14 Apr 2022 06:27:07 -0500
> > Subject: [PATCH 05/23] cirrus: enable various runtime checks on macos and
> >  freebsd
> > 
> > windows is slower than freebsd and mac, so it's okay to enable options which
> > will slow them down some.  Also, the cirrusci mac instances always have lot of
> > cores available.
> 
> > See:
> > https://www.postgresql.org/message-id/20211217193159.pwrelhiyx7kevgsn@alap3.anarazel.de
> > https://www.postgresql.org/message-id/20211213211223.vkgg3wwiss2tragj%40alap3.anarazel.de
> > https://www.postgresql.org/message-id/CAH2-WzmevBhKNEtqX3N-Tkb0gVBHH62C0KfeTxXzqYES_PiFiA%40mail.gmail.com
> > https://www.postgresql.org/message-id/20220325000933.vgazz7pjk2ytj65d@alap3.anarazel.de
> > 
> > ci-os-only: freebsd, macos
> > ---
> >  .cirrus.yml | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/.cirrus.yml b/.cirrus.yml
> > index 183e8746ce6..4ad20892eeb 100644
> > --- a/.cirrus.yml
> > +++ b/.cirrus.yml
> > @@ -113,7 +113,9 @@ task:
> >          \
> >          CC="ccache cc" \
> >          CXX="ccache c++" \
> > -        CFLAGS="-Og -ggdb"
> > +        CPPFLAGS="-DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST"\
 
> > +        CXXFLAGS="-Og -ggdb -march=native -mtune=native" \
> > +        CFLAGS="-Og -ggdb -march=native -mtune=native"
> 
> What's reason for -march=native -mtune=native here?

No particular reason, and my initial patch didn't have it.
I suppose I added it to test its effect and never got rid of it.

> >      EOF
> >    build_script: |
> >      su postgres -c "ccache --zero-stats"
> > @@ -336,8 +338,8 @@ task:
> >        CC="ccache cc" \
> >        CXX="ccache c++" \
> >        CLANG="ccache ${brewpath}/llvm/bin/ccache" \
> > -      CFLAGS="-Og -ggdb" \
> > -      CXXFLAGS="-Og -ggdb" \
> > +      CFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \
> > +      CXXFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \
> >        \
> >        LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \
> >        PYTHON=python3
> 
> I'd also use CPPFLAGS here, given you'd used it above...
> 
> I'm planning to commit an updated version of this change soon, without the
> -march=native -mtune=native bit, unless somebody protests...

One other thing is that your -m32 changes caused the linux/meson task to
take an additional 3+ minutes (total ~8).  That's no issue, except that
the Warnings task depends on the linux/mason task, and itself can take
up to 15 minutes.

So those two potentially take as long as the windows task.
I suggested that CompileWarnings could instead "Depend on: Freebsd",
which currently takes 6-7min (and could take 4-5min if given more CPUs).

-- 
Justin

Re: CI and test improvements

From

Andres Freund

Date:

02 October 2022, 01:36:41

Hi,

On 2022-10-01 19:58:01 -0500, Justin Pryzby wrote:
> One other thing is that your -m32 changes caused the linux/meson task to
> take an additional 3+ minutes (total ~8).  That's no issue, except that
> the Warnings task depends on the linux/mason task, and itself can take
> up to 15 minutes.

> So those two potentially take as long as the windows task.
> I suggested that CompileWarnings could instead "Depend on: Freebsd",
> which currently takes 6-7min (and could take 4-5min if given more CPUs).

I am wondering if we should instead introduce a new "quickcheck" task that
just compiles and runs maybe one test and have *all* other tests depend on
that.  Wasting a precious available windows instance to just fail to build or
immediately fail during tests doesn't really make sense.

Greetings,

Andres Freund

Re: CI and test improvements

From

Andres Freund

Date:

02 October 2022, 20:52:01

Hi,

On 2022-10-01 18:36:41 -0700, Andres Freund wrote:
> I am wondering if we should instead introduce a new "quickcheck" task that
> just compiles and runs maybe one test and have *all* other tests depend on
> that.  Wasting a precious available windows instance to just fail to build or
> immediately fail during tests doesn't really make sense.

Attached is an implementation of that idea.

I fairly randomly chose two quick tests to execute as part of the sanity
check, cube/regress pg_ctl/001_start_stop. I wanted to have coverage for
initdb, a pg_regress style test, a tap test, some other client binary.

With a primed cache this takes ~32s, not too bad imo. 12s of that is cloning
the repo.

What do you think?

We could bake a bare repo into the images to make the clone step in faster,
but that'd be for later anyway.

set -e
rm -rf /tmp/pg-clone-better
mkdir /tmp/pg-clone-better
cd /tmp/pg-clone-better
git init --bare
git remote add origin https://github.com/postgres/postgres.git --no-tags -t 'REL_*' -t master
git fetch -v
git repack -ad -f
du -sh

results in a 227MB repo.

git clone https://github.com/anarazel/postgres.git -v --depth 1000 -b ci-sanitycheck --reference /tmp/pg-clone-better
/tmp/pg-clone-better-clone

clones an example branch in ~1.35s.

Greetings,

Andres Freund

Attachment

v1-0001-ci-Introduce-a-SanityCheck-task-that-other-tasks-.patch

Re: CI and test improvements

From

Justin Pryzby

Date:

02 October 2022, 21:35:06

On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-10-01 18:36:41 -0700, Andres Freund wrote:
> > I am wondering if we should instead introduce a new "quickcheck" task that
> > just compiles and runs maybe one test and have *all* other tests depend on
> > that.  Wasting a precious available windows instance to just fail to build or
> > immediately fail during tests doesn't really make sense.

> With a primed cache this takes ~32s, not too bad imo. 12s of that is
> cloning the repo.

Maybe - that would avoid waiting 4 minutes for a windows instance to
start in the (hopefully atypical) case of a patch that fails in 1-2
minutes under linux/freebsd.

If the patch were completely broken, the windows task would take ~4min
to start, plus up to ~4min before failing to compile or failing an early
test.  6-8 minutes isn't nothing, but doesn't seem worth the added
complexity.

Also, this would mean that in the common case, the slowest task would be
delayed until after the SanityCheck task instance starts, compiles, and
runs some test :( Your best case is 32sec, but I doubt that's going to
be typical.

I was thinking about the idea of cfbot handling "tasks" separately,
similar to what it used to do with travis/appveyor.  The logic for
"windows tasks are only run if linux passes tests" could live there.
That could also be useful if there's ever the possibility of running an
additional OS on another CI provider, or if another provider can run
windows tasks faster, or if we need to reduce our load/dependency on
cirrus.  I realized that goes backwards in some ways to the direction
we've gone with cirrus, and I'm not sure how exactly it would do that (I
suppose it might add ci-os-only tags to its commit message).

> +    # no options enabled, should be small
> +    CCACHE_MAXSIZE: "150M"

Actually, tasks can share caches if the "cache key" is set.

If there was a separate "Sanity" task, I think it should use whatever
flags linux (or freebsd) use to avoid doing two compilations (with lots
of cache misses for patches which modify *.h files, which would then
happen twice, in serial).

> +  # use always: to continue after failures. Task that did not run count as a
> +  # success, so we need to recheck SanityChecks's condition here ...

> -  # task that did not run, count as a success, so we need to recheck Linux'
> -  # condition here ...

Another/better justification/description is that "cirrus warns if the
depending task has different only_if conditions than the dependant task".

-- 
Justin

Re: CI and test improvements

From

Andres Freund

Date:

02 October 2022, 21:51:23

Hi,

On 2022-10-02 16:35:06 -0500, Justin Pryzby wrote:
> Maybe - that would avoid waiting 4 minutes for a windows instance to
> start in the (hopefully atypical) case of a patch that fails in 1-2
> minutes under linux/freebsd.
> 
> If the patch were completely broken, the windows task would take ~4min
> to start, plus up to ~4min before failing to compile or failing an early
> test.  6-8 minutes isn't nothing, but doesn't seem worth the added
> complexity.

Avoiding 6-8mins of wasted windows time would, I think, allow us to crank
cfbot's concurrency up a notch or two.


> Also, this would mean that in the common case, the slowest task would be
> delayed until after the SanityCheck task instance starts, compiles, and
> runs some test :( Your best case is 32sec, but I doubt that's going to
> be typical.

Even the worst case isn't that bad, the uncached minimal build is 67s.


> I was thinking about the idea of cfbot handling "tasks" separately,
> similar to what it used to do with travis/appveyor.  The logic for
> "windows tasks are only run if linux passes tests" could live there.

I don't really see the advantage of doing that over just increasing
concurrency by a bit.


> > +    # no options enabled, should be small
> > +    CCACHE_MAXSIZE: "150M"
> 
> Actually, tasks can share caches if the "cache key" is set.
> If there was a separate "Sanity" task, I think it should use whatever
> flags linux (or freebsd) use to avoid doing two compilations (with lots
> of cache misses for patches which modify *.h files, which would then
> happen twice, in serial).

I think the price of using exactly the same flags is higher than the gain. And
it'll rarely work if we use the container task for the sanity check, as the
timestamps of the compiler, system headers etc will be different.


> > +  # use always: to continue after failures. Task that did not run count as a
> > +  # success, so we need to recheck SanityChecks's condition here ...
> 
> > -  # task that did not run, count as a success, so we need to recheck Linux'
> > -  # condition here ...
> 
> Another/better justification/description is that "cirrus warns if the
> depending task has different only_if conditions than the dependant task".

That doesn't really seem easier to understand to me.

Greetings,

Andres Freund

Re: CI and test improvements

From

Andres Freund

Date:

02 October 2022, 21:54:21

Hi,

On 2022-10-02 16:35:06 -0500, Justin Pryzby wrote:
> On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote:
> > On 2022-10-01 18:36:41 -0700, Andres Freund wrote:
> > > I am wondering if we should instead introduce a new "quickcheck" task that
> > > just compiles and runs maybe one test and have *all* other tests depend on
> > > that.  Wasting a precious available windows instance to just fail to build or
> > > immediately fail during tests doesn't really make sense.
> 
> > With a primed cache this takes ~32s, not too bad imo. 12s of that is
> > cloning the repo.
> 
> Maybe - that would avoid waiting 4 minutes for a windows instance to
> start in the (hopefully atypical) case of a patch that fails in 1-2
> minutes under linux/freebsd.
> 
> If the patch were completely broken, the windows task would take ~4min
> to start, plus up to ~4min before failing to compile or failing an early
> test.  6-8 minutes isn't nothing, but doesn't seem worth the added
> complexity.

Btw, the motivation to work on this just now was that I'd like to enable more
sanitizers (undefined,alignment for linux-meson, address for
linux-autoconf). Yes, we could make the dependency on freebsd instead, but I'd
like to try to enable the clang-only memory sanitizer there (if it works on
freebsd)...

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

03 October 2022, 02:15:41

On Sun, Oct 02, 2022 at 02:54:21PM -0700, Andres Freund wrote:
> the clang-only memory sanitizer there (if it works on freebsd)...

Have you looked at this much ?  I think it'll require a bunch of
exclusions, right ?

-- 
Justin

Re: CI and test improvements

From

Justin Pryzby

Date:

04 November 2022, 23:54:12

On Sat, Sep 10, 2022 at 03:05:42PM -0500, Justin Pryzby wrote:
> On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:
> > On 2022-08-28 12:10:29 -0500, Justin Pryzby wrote:
> > > On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote:
> > > > > --- /dev/null
> > > > > +++ b/src/tools/ci/windows-compiler-warnings
> > > >
> > > > Wouldn't that be doable as something like
> > > > sh -c 'if test -s file; then cat file;exit 1; fi"
> > > > inside .cirrus.yml?
> > >
> > > I had written it inline in a couple ways, like
> > > - sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi'
> > >
> > > but then separated it out as you suggested in
> > > 20220227010908.vz2a7dmfzgwg742w@alap3.anarazel.de
> > >
> > > after I complained about cmd.exe requiring escaping for && and ||
> > > That makes writing any shell script a bit perilous and a separate script
> > > seems better.
> > 
> > I remember that I suggested it - but note that the way I wrote above doesn't
> > have anything needing escaping. 
> 
> It doesn't require it, but that still gives the impression that it's
> normally possible to write one-liner shell scripts there, which is
> misleading/wrong, and the reason why I took your suggestion to use a
> separate script file.
> 
> > Anyway, what do you think of the multiline split I suggested?
> 
> Done, and sorted.

Rewrote this and rebased some of the other stuff on top of the meson
commit, for which I also include some new patches.

Attachment

Re: CI and test improvements

From

Andres Freund

Date:

05 November 2022, 01:59:46

Hi,

On 2022-11-04 18:54:12 -0500, Justin Pryzby wrote:
> Subject: [PATCH 1/8] meson: PROVE is not required
> Subject: [PATCH 3/8] meson: rename 'main' tasks to 'regress' and 'isolation'

Pushed, thanks for the patches.


> Subject: [PATCH 2/8] meson: other fixes for cygwin
>
> XXX: what about HAVE_BUGGY_STRTOF ?

What are you wondering about here? Shouldn't that continue to be set via
src/include/port/cygwin.h?


> diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build
> index 3dcfc11278f..6ec3c77af53 100644
> --- a/src/test/regress/meson.build
> +++ b/src/test/regress/meson.build
> @@ -10,7 +10,7 @@ regress_sources = pg_regress_c + files(
>  # patterns like ".*-.*-mingw.*". We probably can do better, but for now just
>  # replace 'gcc' with 'mingw' on windows.
>  host_tuple_cc = cc.get_id()
> -if host_system == 'windows' and host_tuple_cc == 'gcc'
> +if host_system in ['windows', 'cygwin'] and host_tuple_cc == 'gcc'
>    host_tuple_cc = 'mingw'
>  endif

This doesn't quite seem right - shouldn't it say cywin? Not that it makes a
difference right now, given the contents of resultmap:
float4:out:.*-.*-cygwin.*=float4-misrounded-input.out
float4:out:.*-.*-mingw.*=float4-misrounded-input.out


> From 0acbbd2fdd97bbafc5c4552e26f92d52c597eea9 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Wed, 25 May 2022 21:53:22 -0500
> Subject: [PATCH 4/8] cirrus/windows: add compiler_warnings_script
>
> I'm not sure how to write this test in windows shell; it's also not easy to
> write it in posix sh, since windows shell is somehow interpretting && and ||...
>
> https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de
>
> See also:
> 8a1ce5e54f6d144e4f8e19af7c767b026ee0c956
> https://cirrus-ci.com/task/6241060062494720
> https://cirrus-ci.com/task/6496366607204352
>
> ci-os-only: windows
> ---
>  .cirrus.yml                            | 10 +++++++++-
>  src/tools/ci/windows-compiler-warnings | 24 ++++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 1 deletion(-)
>  create mode 100755 src/tools/ci/windows-compiler-warnings
>
> diff --git a/.cirrus.yml b/.cirrus.yml
> index 9f2282471a9..99ac09dc679 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -451,12 +451,20 @@ task:
>
>    build_script: |
>      vcvarsall x64
> -    ninja -C build
> +    ninja -C build |tee build/meson-logs/build.txt
> +    REM Since pipes lose exit status of the preceding command, rerun compilation,
> +    REM without the pipe exiting now if it fails, rather than trying to run checks
> +    ninja -C build > nul

This seems mighty grotty :(. but I guess it's quick enough not worry about,
and I can't come up with a better plan.

It doesn't seem quite right to redirect into meson-logs/ to me, my
interpretation is that that's "meson's namespace".  Why not just store it in
build/?



> From e85fe83fd8a4b4c79a96d2bf66cd6a5e1bdfcd1e Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sat, 26 Feb 2022 19:34:35 -0600
> Subject: [PATCH 5/8] cirrus: build docs as a separate task..
>
> This will run the doc build if any docs have changed, even if Linux
> fails, to allow catch doc build failures.
>
> This'll automatically show up as a separate "column" on cfbot.
>
> Also, in the future, this will hopefully upload each patch's changed HTML docs
> as an artifact, for easy review.
>
> Note that this is currently building docs with both autoconf and meson.
>
> ci-os-only: html
> ---
>  .cirrus.yml | 62 +++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 44 insertions(+), 18 deletions(-)
>
> diff --git a/.cirrus.yml b/.cirrus.yml
> index 99ac09dc679..37fd79e5b77 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -472,6 +472,9 @@ task:
>        type: text/plain
>
>
> +###
> +# Test that code can be built with gcc/clang without warnings
> +###
>  task:
>    name: CompilerWarnings
>
> @@ -515,10 +518,6 @@ task:
>      #apt-get update
>      #DEBIAN_FRONTEND=noninteractive apt-get -y install ...
>
> -  ###
> -  # Test that code can be built with gcc/clang without warnings
> -  ###
> -

Why remove this?


>    setup_script: echo "COPT=-Werror" > src/Makefile.custom
>
>    # Trace probes have a history of getting accidentally broken. Use the
> @@ -580,20 +579,6 @@ task:
>        make -s -j${BUILD_JOBS} clean
>        time make -s -j${BUILD_JOBS} world-bin
>
> -  ###
> -  # Verify docs can be built
> -  ###
> -  # XXX: Only do this if there have been changes in doc/ since last build
> -  always:
> -    docs_build_script: |
> -      time ./configure \
> -        --cache gcc.cache \
> -        CC="ccache gcc" \
> -        CXX="ccache g++" \
> -        CLANG="ccache clang"
> -      make -s -j${BUILD_JOBS} clean
> -      time make -s -j${BUILD_JOBS} -C doc
> -
>    ###
>    # Verify headerscheck / cpluspluscheck succeed
>    #
> @@ -617,3 +602,44 @@ task:
>
>    always:
>      upload_caches: ccache
> +
> +
> +###
> +# Verify docs can be built
> +# changesInclude() will skip this task if none of the commits since
> +# CIRRUS_LAST_GREEN_CHANGE touched any relevant files. The comparison appears
> +# to be like "git log a..b -- ./file", not "git diff a..b -- ./file"
> +###
> +
> +task:
> +  name: Documentation
> +
> +  env:
> +    CPUS: 1
> +    BUILD_JOBS: 1
> +
> +  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~
'.*\nci-os-only:[^\n]*(docs|html).*'
> +  skip: "!changesInclude('.cirrus.yml', 'doc/**')"

Perhaps we should introduce something other than ci-os-only if we want that to
include things like "docs and html".  At least this should update
src/tools/ci/README.


> +  sysinfo_script: |
> +    id
> +    uname -a
> +    cat /proc/cmdline
> +    ulimit -a -H && ulimit -a -S
> +    export

I think we can skip this here.


> +  # Exercise HTML and other docs:
> +  ninja_docs_build_script: |
> +    mkdir build.ninja
> +    cd build.ninja

Perhaps build-ninja instead? build.ninja is the filename for ninja's build
instructions, so it seems a bit confusing.


> From adebe93a4409990e929f2775d45c6613134a4243 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Tue, 26 Jul 2022 20:30:02 -0500
> Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys..
>
> Since otherwise, building with ci-os-only will probably fail to use the
> normal cache, since the cache key is computed using both the task name
> and its *index* in the list of caches (internal/executor/cache.go:184).

Seems like this would potentially better addressed by reporting a bug to the
cirrus folks?


>    ccache_cache:
>      folder: ${CCACHE_DIR}
> +    fingerprint_key: ccache/linux
> +    reupload_on_changes: true

There's enough copies of this to make it worth deduplicating. If we use
something like
   fingerprint_script: echo ccache/$CIRRUS_BRANCH/$CIRRUS_OS
we can use a yaml ref?


I think you experimented with creating a 'base' ccache dir (e.g. on the master
branch) and then using branch specific secondar caches? How did that turn out?
I think cfbot's caches constantly get removed due to overrunning the global
space.


> From f16739bc5d2087847129baf663aa25fa9edb8449 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sun, 3 Apr 2022 00:10:20 -0500
> Subject: [PATCH 7/8] cirrus/ccache: disable compression and show stats

> Since v4.0, ccache enables zstd compression by default, saving roughly
> 2x-3x.  But, cirrus caches are compressed as tar.gz, so we could disable
> ccache compression, allowing cirrus to gzip the uncompressed data
> (better than ccache's default of zstd-1).
>
> With default compression enabled (https://cirrus-ci.com/task/6692342840164352):
> linux/debian/bullseye has 4.2; 99MB cirrus cache; cache_size_kibibyte    109616
> macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte    52500
> freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte    134064
> XXX windows has 4.7.2; 180MB cirrus cache; cache_size_kibibyte    51179
> todo: compiler warnings
>
> With compression disabled (https://cirrus-ci.com/task/4614182514458624):
> linux: 91MB cirrus cache; cache_size_kibibyte    316136
> macos: 41MB cirrus cache; cache_size_kibibyte    118068
> windows: 42MB cirrus cache; cache_size_kibibyte    134064
> freebsd is the same

I'm still somewhat doubtful this is a good idea. The mingw cache is huge, for
example, and all that additional IO and memory usage is bound to show up.


> The stats should be shown and/or logged.
> ccache --show-stats shows the *cumulative* stats (including prior
> compilations)
> ccache --zero-stats clears out not only the global stats, but the
> per-file cache stats (from which the global stats are derived) - which
> obviously makes the cache work poorly.
>
> Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.
> The log should be written *outside* the ccache folder - it shouldn't be
> preserved across cirrusci task invocations.

I assume we don't have a new enough ccache everywhere yet?

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

13 November 2022, 23:53:04

On Fri, Nov 04, 2022 at 06:59:46PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-11-04 18:54:12 -0500, Justin Pryzby wrote:
> > Subject: [PATCH 1/8] meson: PROVE is not required
> > Subject: [PATCH 3/8] meson: rename 'main' tasks to 'regress' and 'isolation'
> 
> Pushed, thanks for the patches.

Thanks.

> > diff --git a/.cirrus.yml b/.cirrus.yml
> > index 9f2282471a9..99ac09dc679 100644
> > --- a/.cirrus.yml
> > +++ b/.cirrus.yml
> > @@ -451,12 +451,20 @@ task:
> >
> >    build_script: |
> >      vcvarsall x64
> > -    ninja -C build
> > +    ninja -C build |tee build/meson-logs/build.txt
> > +    REM Since pipes lose exit status of the preceding command, rerun compilation,
> > +    REM without the pipe exiting now if it fails, rather than trying to run checks
> > +    ninja -C build > nul
> 
> This seems mighty grotty :(. but I guess it's quick enough not worry about,
> and I can't come up with a better plan.
> 
> It doesn't seem quite right to redirect into meson-logs/ to me, my
> interpretation is that that's "meson's namespace".  Why not just store it in
> build/?

I put it there so it'd be included with the build artifacts.
Maybe it's worth adding a separate line to artifacts for stuff like
this, and ccache log ?

> > From e85fe83fd8a4b4c79a96d2bf66cd6a5e1bdfcd1e Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > Date: Sat, 26 Feb 2022 19:34:35 -0600
> > Subject: [PATCH 5/8] cirrus: build docs as a separate task..

> > +  # Exercise HTML and other docs:
> > +  ninja_docs_build_script: |
> > +    mkdir build.ninja
> > +    cd build.ninja
> 
> Perhaps build-ninja instead? build.ninja is the filename for ninja's build
> instructions, so it seems a bit confusing.

Sure.

Do you think building docs with both autoconf and meson is what's
desirable here ?

I'm not sure if this ought to be combined with/before/after your "move
compilerwarnings task to meson" patch?  (Regarding that patch: I
mentioned that it shouldn't use ccache -C, and it should use
meson_log_artifacts.)

> > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > Date: Tue, 26 Jul 2022 20:30:02 -0500
> > Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys..
> >
> > Since otherwise, building with ci-os-only will probably fail to use the
> > normal cache, since the cache key is computed using both the task name
> > and its *index* in the list of caches (internal/executor/cache.go:184).
> 
> Seems like this would potentially better addressed by reporting a bug to the
> cirrus folks?

You said that before, but I don't think so - since they wrote code to do
that, it's odd to file a bug that says that the behavior is wrong.  I am
curious why, but it seems delibrate.

https://www.postgresql.org/message-id/20220828171029.GO2342%40telsasoft.com

> There's enough copies of this to make it worth deduplicating. If we
> use something like fingerprint_script: echo
> ccache/$CIRRUS_BRANCH/$CIRRUS_OS we can use a yaml ref?

I'll look into it...

> I think you experimented with creating a 'base' ccache dir (e.g. on the master
> branch) and then using branch specific secondar caches?

I have to revisit that sometime.

That's a new feaure in ccache 4.4, which is currently only in macos.
This is another thing that it'd be easier to test by having cfbot
clobber the cirrus.yml rather than committing to postgres repo.
(Technically, it should probably only do use the in-testing cirrus.yml
if the patch it's testing doesn't itself modify .cirrus.yml)

> How did that turn out?  I think cfbot's caches constantly get removed
> due to overrunning the global space.

For cfbot, I don't know if there's much hope that any patch-specific
build artifacts will be cached from the previous run, typically ~24h
prior.

One idea I have, for the "Warnings" task (and maybe linux too), is to
defer pruning until after all the compilations.  To avoid LRU pruning
during early tasks causing bad hit ratios of later tasks.

Another thing that probably happens is that task1 starts compiling
patch1, and then another instance of task1 starts compiling patch2.  A
bit later, the first instance will upload its ccache result for patch1,
which will be summarily overwritten by the second instance's compilation
result, which doesn't include anything from the first instance.

Also, whenever ccache hits its MAXSIZE threshold, it prunes the cache
down to 80% of the configured size, which probably wipes away everything
from all but the most recent ~20 builds.

I also thought about having separate caches for each compilation in the
warnings task - but that requires too much repeated yaml just for that..

> > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > Date: Sun, 3 Apr 2022 00:10:20 -0500
> > Subject: [PATCH 7/8] cirrus/ccache: disable compression and show stats
> >
> > linux/debian/bullseye has 4.2; 99MB cirrus cache; cache_size_kibibyte    109616
> > macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte    52500
> > freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte    134064
> > XXX windows has 4.7.2; 180MB cirrus cache; cache_size_kibibyte    51179
> 
> I'm still somewhat doubtful this is a good idea. The mingw cache is huge, for
> example, and all that additional IO and memory usage is bound to show up.

I think you're referring to the proposed mingw task which runs under
windows, and not the existing cross-compilation ?

And you're right - I remember this now (I think it's due to PCH?)

In my local copy I'd "unset CCACHE_NOCOMPRESS".  But I view that as an
oddity of windows headers, rather than an argument against disabling
compression elsewhere.  BTW, freebsd ccache is too old to use
compression.

What about using CCACHE_HARDLINK (which implies no compression) ?

> > Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.
> > The log should be written *outside* the ccache folder - it shouldn't be
> > preserved across cirrusci task invocations.
> 
> I assume we don't have a new enough ccache everywhere yet?

No - see above.

I've added patches to update macos.

-- 
Justin

Attachment

Re: CI and test improvements

From

Andres Freund

Date:

17 November 2022, 03:48:14

Hi,

On 2022-10-02 14:54:21 -0700, Andres Freund wrote:
> On 2022-10-02 16:35:06 -0500, Justin Pryzby wrote:
> > On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote:
> > > On 2022-10-01 18:36:41 -0700, Andres Freund wrote:
> > > > I am wondering if we should instead introduce a new "quickcheck" task that
> > > > just compiles and runs maybe one test and have *all* other tests depend on
> > > > that.  Wasting a precious available windows instance to just fail to build or
> > > > immediately fail during tests doesn't really make sense.
> > 
> > > With a primed cache this takes ~32s, not too bad imo. 12s of that is
> > > cloning the repo.
> > 
> > Maybe - that would avoid waiting 4 minutes for a windows instance to
> > start in the (hopefully atypical) case of a patch that fails in 1-2
> > minutes under linux/freebsd.
> > 
> > If the patch were completely broken, the windows task would take ~4min
> > to start, plus up to ~4min before failing to compile or failing an early
> > test.  6-8 minutes isn't nothing, but doesn't seem worth the added
> > complexity.
> 
> Btw, the motivation to work on this just now was that I'd like to enable more
> sanitizers (undefined,alignment for linux-meson, address for
> linux-autoconf). Yes, we could make the dependency on freebsd instead, but I'd
> like to try to enable the clang-only memory sanitizer there (if it works on
> freebsd)...

I've used this a bunch on personal branches, and I think it's the way to
go. It doesn't take long, saves a lot of cycles when one pushes something
broken. Starts to runs the CompilerWarnings task after a minimal amount of
sanity checking, instead of having to wait for a task running all tests,
without the waste of running it immediately and failing all the different
configurations, which takes forever.

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

17 November 2022, 03:58:39

On Wed, Nov 16, 2022 at 07:48:14PM -0800, Andres Freund wrote:
> I've used this a bunch on personal branches, and I think it's the way to
> go. It doesn't take long, saves a lot of cycles when one pushes something
> broken. Starts to runs the CompilerWarnings task after a minimal amount of
> sanity checking, instead of having to wait for a task running all tests,
> without the waste of running it immediately and failing all the different
> configurations, which takes forever.

Well, I don't hate it.

But I don't think you should call "ccache -z":

On Tue, Oct 18, 2022 at 12:09:30PM -0500, Justin Pryzby wrote:
> I realized that ccache -z clears out not only the global stats, but the
> per-file cache stats (from which the global stats are derived) - which
> obviously makes the cache work poorly.


-- 
Justin

Re: CI and test improvements

From

Andres Freund

Date:

17 November 2022, 04:08:32

Hi,

On 2022-11-16 21:58:39 -0600, Justin Pryzby wrote:
> On Wed, Nov 16, 2022 at 07:48:14PM -0800, Andres Freund wrote:
> > I've used this a bunch on personal branches, and I think it's the way to
> > go. It doesn't take long, saves a lot of cycles when one pushes something
> > broken. Starts to runs the CompilerWarnings task after a minimal amount of
> > sanity checking, instead of having to wait for a task running all tests,
> > without the waste of running it immediately and failing all the different
> > configurations, which takes forever.
> 
> Well, I don't hate it.
> 
> But I don't think you should call "ccache -z":

Agreed - that was really just for "development" of the task.

I also don't like my "cores_script". Not quite sure yet how to do that
more cleanly.

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

17 November 2022, 04:16:31

On Wed, Nov 16, 2022 at 08:08:32PM -0800, Andres Freund wrote:
> I also don't like my "cores_script". Not quite sure yet how to do that
> more cleanly.

I don't know which is cleaner:

ls /core* && mv /tmp/core* /tmp/cores

find / -maxdepth 1 -type f -name 'core*' -print0 |
    xargs -r0 mv -vt /tmp/cores

for a in /core*; do [ ! -e "$a" ] || mv "$a" /tmp/cores; done

Re: CI and test improvements

From

Justin Pryzby

Date:

19 November 2022, 20:22:20

On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote:
>  
> +# To avoid unnecessarily spinning up a lot of VMs / containers for entirely
> +# broken commits, have a very minimal test that all others depend on.
> +task:
> +  name: SanityCheck

Maybe this should be named 00-SanityCheck, so it sorts first in cfbot ?

Also, if CompilerWarnings doesn't depend on Linux, that means those two
tasks will normally start and run simultaneously, which means a single
branch will use all 8 of the linux CPUs available from cirrus.  Is that
intentional?

-- 
Justin

Re: CI and test improvements

From

Andres Freund

Date:

19 November 2022, 21:18:54

Hi,

On 2022-11-19 14:22:20 -0600, Justin Pryzby wrote:
> On Sun, Oct 02, 2022 at 01:52:01PM -0700, Andres Freund wrote:
> >  
> > +# To avoid unnecessarily spinning up a lot of VMs / containers for entirely
> > +# broken commits, have a very minimal test that all others depend on.
> > +task:
> > +  name: SanityCheck
> 
> Maybe this should be named 00-SanityCheck, so it sorts first in cfbot ?

Hm. Perhaps cfbot could just use the sorting from cirrus? I don't really like
the idea of making the names more confusing with numbered prefixes,
particularly when only used for some but not other tasks.

> Also, if CompilerWarnings doesn't depend on Linux, that means those two
> tasks will normally start and run simultaneously, which means a single
> branch will use all 8 of the linux CPUs available from cirrus.  Is that
> intentional?

I don't think that'd really make anything worse. But perhaps we could just
reduce the CPU count for linux autoconf by 1? I suspect that even with asan
enabled it'd still be roughly even with the rest.

I'll try to repost a version of the ubsan/asan patch together with the
sanitycheck patch and see how that looks.

Greetings,

Andres Freund

Re: CI and test improvements

From

Andres Freund

Date:

19 November 2022, 21:35:17

Hi,

On 2022-11-19 13:18:54 -0800, Andres Freund wrote:
> > Also, if CompilerWarnings doesn't depend on Linux, that means those two
> > tasks will normally start and run simultaneously, which means a single
> > branch will use all 8 of the linux CPUs available from cirrus.  Is that
> > intentional?
> 
> I don't think that'd really make anything worse. But perhaps we could just
> reduce the CPU count for linux autoconf by 1? I suspect that even with asan
> enabled it'd still be roughly even with the rest.

Hm, that doesn't suffice, because we allow 4 cores for the warnings task. The
limit for cirrus is 16 linux CPUs though, not 8. We'll temporarily go up to 12
due to CompilerWarnings after the change. But I think that's fine, because
we'd previously use the same amount of CPUs, just some of it
sequentially.

From the POV of linux CPUs we'd still be able to start a second task
concurrently without delaying the sanitycheck task, and then at max delaying
one of the other linux tasks (meson, autoconf, compiler warnings).

The limit is, and continues to be, be the number of concurrent macos
VMs. Might be better after moving to m1 macs.

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

19 November 2022, 21:45:06

On Sat, Nov 19, 2022 at 01:18:54PM -0800, Andres Freund wrote:
> > Also, if CompilerWarnings doesn't depend on Linux, that means those two
> > tasks will normally start and run simultaneously, which means a single
> > branch will use all 8 of the linux CPUs available from cirrus.  Is that
> > intentional?
> 
> I don't think that'd really make anything worse. But perhaps we could just
> reduce the CPU count for linux autoconf by 1?

I didn't understand the goal of "reducing by one" ?

Up to now, most tasks are using half of the available CPUs, which seemed
deliberate.  Like maybe to allow running two branches simultaneously
(that doesn't necessarily work well with ccache, though).

On Sat, Nov 19, 2022 at 01:35:17PM -0800, Andres Freund wrote:
> The limit for cirrus is 16 linux CPUs though, not 8.

Oh.  Then I don't see any issue.

> We'll temporarily go up to 12 due to CompilerWarnings after the change.

What do you mean "temporarily" ?  I think you're implying that the
Warnings task is fast but (at least right now) it is not.

Note that the most recent "code coverage" task is built into the
linux-autoconf task, and slows it down some more.  That's because it's
the only remaining in-tree build, and I aimed to only show coverage for
changed files (I know you questioned whether that was okay, but to me it
still seems to be valuable, even though it obviously doesn't show
changes outside of those files).  And I couldn't see how to map from
"object filename to source file" with meson, although I guess it's
possible with instrospection.  I haven't re-sent that patch because it's
waiting on cfbot changes.

-- 
Justin

Re: CI and test improvements

From

Andres Freund

Date:

19 November 2022, 22:14:17

Hi,

On 2022-11-19 15:45:06 -0600, Justin Pryzby wrote:
> What do you mean "temporarily" ?  I think you're implying that the
> Warnings task is fast but (at least right now) it is not.

In the sense that we don't need all CPUs until the whole commit has finished
testing (none of the tasks are the slowest task, even after ubsan/asan). As
soon as one of the linux tests has finished for one commit, another task in a
concurrently tested commit can start.  Whereas that's not the case for macos,
due to the VM limit.

(cfbot has double the limits, because it has a 10$/mo account)

> Note that the most recent "code coverage" task is built into the
> linux-autoconf task, and slows it down some more.  That's because it's
> the only remaining in-tree build, and I aimed to only show coverage for
> changed files (I know you questioned whether that was okay, but to me it
> still seems to be valuable, even though it obviously doesn't show
> changes outside of those files).

I think we shouldn't add further tests using autoconf, that'll just mean we'll
have to do the work changing that test at some later point.

> And I couldn't see how to map from
> "object filename to source file" with meson, although I guess it's
> possible with instrospection.  I haven't re-sent that patch because it's
> waiting on cfbot changes.

The object files should have that in their metadata, fwiw.

Greetings,

Andres Freund

Re: CI and test improvements

From

Andres Freund

Date:

21 November 2022, 22:09:03

Hi,

On 2022-11-19 13:18:54 -0800, Andres Freund wrote:
> I'll try to repost a version of the ubsan/asan patch together with the
> sanitycheck patch and see how that looks.

I just pushed the prerequisite patch making UBSAN_OPTIONS work. Attached
is 1) addition of SanityCheck 2) use of asan and ubsan+alignment san to
the linux tasks.

I went with a variation of the find command for SanityCheck's
cores_script, but used -exec to invoke mv, as that results in a nicer
looking commandline imo.

Previously the SanityCheck patch did trigger warnings about only_if not
matching, despite SanityCheck not having an only_if, but I reported that
as a bug to cirrus-ci, and they fixed that.

Pretty happy with this.

Greetings,

Andres Freund

Attachment

Re: CI and test improvements

From

Andres Freund

Date:

21 November 2022, 22:45:42

Hi,

On 2022-11-13 17:53:04 -0600, Justin Pryzby wrote:
> On Fri, Nov 04, 2022 at 06:59:46PM -0700, Andres Freund wrote:
> > > diff --git a/.cirrus.yml b/.cirrus.yml
> > > index 9f2282471a9..99ac09dc679 100644
> > > --- a/.cirrus.yml
> > > +++ b/.cirrus.yml
> > > @@ -451,12 +451,20 @@ task:
> > >
> > >    build_script: |
> > >      vcvarsall x64
> > > -    ninja -C build
> > > +    ninja -C build |tee build/meson-logs/build.txt
> > > +    REM Since pipes lose exit status of the preceding command, rerun compilation,
> > > +    REM without the pipe exiting now if it fails, rather than trying to run checks
> > > +    ninja -C build > nul
> > 
> > This seems mighty grotty :(. but I guess it's quick enough not worry about,
> > and I can't come up with a better plan.
> > 
> > It doesn't seem quite right to redirect into meson-logs/ to me, my
> > interpretation is that that's "meson's namespace".  Why not just store it in
> > build/?
> 
> I put it there so it'd be included with the build artifacts.

Wouldn't just naming it build-warnings.log suffice? I don't think we
want to actually upload build.txt - it already is captured.


> > > From e85fe83fd8a4b4c79a96d2bf66cd6a5e1bdfcd1e Mon Sep 17 00:00:00 2001
> > > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > > Date: Sat, 26 Feb 2022 19:34:35 -0600
> > > Subject: [PATCH 5/8] cirrus: build docs as a separate task..
> 
> > > +  # Exercise HTML and other docs:
> > > +  ninja_docs_build_script: |
> > > +    mkdir build.ninja
> > > +    cd build.ninja
> > 
> > Perhaps build-ninja instead? build.ninja is the filename for ninja's build
> > instructions, so it seems a bit confusing.
> 
> Sure.
> 
> Do you think building docs with both autoconf and meson is what's
> desirable here ?

Not sure.


> I'm not sure if this ought to be combined with/before/after your "move
> compilerwarnings task to meson" patch?  (Regarding that patch: I
> mentioned that it shouldn't use ccache -C, and it should use
> meson_log_artifacts.)

TBH, I'm not quite sure a separate docs task does really still make
sense after the SanityCheck task. It's worth building the docs even if
some flappy test fails, but I don't think we should try to build the
docs if the code doesn't even compile, in all likelihood a lot more is
wrong in that case.


> > > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > > Date: Tue, 26 Jul 2022 20:30:02 -0500
> > > Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys..
> > >
> > > Since otherwise, building with ci-os-only will probably fail to use the
> > > normal cache, since the cache key is computed using both the task name
> > > and its *index* in the list of caches (internal/executor/cache.go:184).
> > 
> > Seems like this would potentially better addressed by reporting a bug to the
> > cirrus folks?
> 
> You said that before, but I don't think so - since they wrote code to do
> that, it's odd to file a bug that says that the behavior is wrong.  I am
> curious why, but it seems delibrate.
> 
> https://www.postgresql.org/message-id/20220828171029.GO2342%40telsasoft.com

I suspect this is just about dealing with unnamed tasks and could be
handled by just mixing in CI_NODE_INDEX if the task name isn't set.


I pushed a version of 0007-cirrus-clean-up-windows-task.patch. I didn't
rename the task as I would like to add a msbuild version of the task at
some point (it's pretty easy to break msbuild but not ninja
unfortunately). In additional I also removed NO_TEMP_INSTALL.

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

22 November 2022, 22:57:44

On Mon, Nov 21, 2022 at 02:45:42PM -0800, Andres Freund wrote:
> > > > +    ninja -C build |tee build/meson-logs/build.txt
> > > > +    REM Since pipes lose exit status of the preceding command, rerun compilation,
> > > > +    REM without the pipe exiting now if it fails, rather than trying to run checks
> > > > +    ninja -C build > nul
> > > 
> > > This seems mighty grotty :(. but I guess it's quick enough not worry about,
> > > and I can't come up with a better plan.
> > > 
> > > It doesn't seem quite right to redirect into meson-logs/ to me, my
> > > interpretation is that that's "meson's namespace".  Why not just store it in
> > > build/?
> > 
> > I put it there so it'd be included with the build artifacts.
> 
> Wouldn't just naming it build-warnings.log suffice? I don't think we
> want to actually upload build.txt - it already is captured.

Originally, I wanted the input and the output to be available as files
and not just in cirrus' web GUI, but maybe that's not important anymore.
I rewrote it again.

> > I'm not sure if this ought to be combined with/before/after your "move
> > compilerwarnings task to meson" patch?  (Regarding that patch: I
> > mentioned that it shouldn't use ccache -C, and it should use
> > meson_log_artifacts.)
> 
> TBH, I'm not quite sure a separate docs task does really still make
> sense after the SanityCheck task. It's worth building the docs even if
> some flappy test fails, but I don't think we should try to build the
> docs if the code doesn't even compile, in all likelihood a lot more is
> wrong in that case.

It'd be okay either way.  I had split it out to 1) isolate the changes
in the "upload changed docs as artifacts" patch; and, 2) so the docs
artifacts are visible in a cfbot link called "Documentation"; and, 3) so
the docs task runs without a dependency on "Linux", since (as you said)
docs/errors are worth showing/reviewing/reporting/addressing separately
from test errors (perhaps similar to compiler warnings...).

I shuffled my branch around and sending now the current "docs" patches,
but I suppose this is waiting on the "convert CompilerWarnings task to
meson" patch.

-- 
Justin

Attachment

Re: CI and test improvements

From

Thomas Munro

Date:

30 December 2022, 03:59:03

On Wed, Nov 23, 2022 at 11:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> [PATCH 02/10] cirrus/macos: switch to "macos_instance" / M1..

Duelling patches.

Bilal's patch[1] uses the matrix feature to run the tests on both
Intel and ARM, which made sense when he proposed it, but according to
Cirrus CI warnings, the Intel instances are about to go away.  So I
think we just need your smaller change to switch the instance type.

As for the pathname change, there is another place that knows where
Homebrew lives, in ldap/001_auth.  Fixed in the attached.  That test
just SKIPs if it can't find the binary, making it harder to notice.
Standardising our approach here might make sense for a later patch.
As for the kerberos test, Bilal's patch may well be a better idea (it
adds MacPorts for one thing), and so I'll suggest rebasing that, but
here I just wanted the minimum mechanical fix to avoid breaking on the
1st of Jan.

I plan to push this soon if there are no objections.  Then discussion
of Bilal's patch can continue.

> [PATCH 03/10] cirrus/macos: update to macos ventura

I don't know any reason not to push this one too, but it's not time critical.

[1] https://www.postgresql.org/message-id/flat/CAN55FZ2R%2BXufuVgJ8ew_yDBk48PgXEBvyKNvnNdTTVyczbQj0g%40mail.gmail.com

Attachment

0001-ci-Change-macOS-builds-from-Intel-to-ARM.patch

Re: CI and test improvements

From

vignesh C

Date:

03 January 2023, 12:16:59

On Fri, 30 Dec 2022 at 09:29, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Wed, Nov 23, 2022 at 11:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > [PATCH 02/10] cirrus/macos: switch to "macos_instance" / M1..
>
> Duelling patches.
>
> Bilal's patch[1] uses the matrix feature to run the tests on both
> Intel and ARM, which made sense when he proposed it, but according to
> Cirrus CI warnings, the Intel instances are about to go away.  So I
> think we just need your smaller change to switch the instance type.
>
> As for the pathname change, there is another place that knows where
> Homebrew lives, in ldap/001_auth.  Fixed in the attached.  That test
> just SKIPs if it can't find the binary, making it harder to notice.
> Standardising our approach here might make sense for a later patch.
> As for the kerberos test, Bilal's patch may well be a better idea (it
> adds MacPorts for one thing), and so I'll suggest rebasing that, but
> here I just wanted the minimum mechanical fix to avoid breaking on the
> 1st of Jan.
>
> I plan to push this soon if there are no objections.  Then discussion
> of Bilal's patch can continue.
>
> > [PATCH 03/10] cirrus/macos: update to macos ventura
>
> I don't know any reason not to push this one too, but it's not time critical.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
e351f85418313e97c203c73181757a007dfda6d0 ===
=== applying patch ./0001-ci-Change-macOS-builds-from-Intel-to-ARM.patch
patching file .cirrus.yml
Hunk #1 FAILED at 407.
Hunk #2 FAILED at 428.
Hunk #3 FAILED at 475.
3 out of 3 hunks FAILED -- saving rejects to file .cirrus.yml.rej
patching file src/test/kerberos/t/001_auth.pl
Hunk #1 FAILED at 32.
1 out of 1 hunk FAILED -- saving rejects to file
src/test/kerberos/t/001_auth.pl.rej
patching file src/test/ldap/t/001_auth.pl
Hunk #1 FAILED at 21.
1 out of 1 hunk FAILED -- saving rejects to file src/test/ldap/t/001_auth.pl.rej

[1] - http://cfbot.cputube.org/patch_41_3709.log

Regards,
Vigneh

Re: CI and test improvements

From

Justin Pryzby

Date:

04 January 2023, 23:19:24

On Mon, Nov 21, 2022 at 02:45:42PM -0800, Andres Freund wrote:
> On 2022-11-13 17:53:04 -0600, Justin Pryzby wrote:
> > > > From: Justin Pryzby <pryzbyj@telsasoft.com>
> > > > Date: Tue, 26 Jul 2022 20:30:02 -0500
> > > > Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys..
> > > >
> > > > Since otherwise, building with ci-os-only will probably fail to use the
> > > > normal cache, since the cache key is computed using both the task name
> > > > and its *index* in the list of caches (internal/executor/cache.go:184).
> > > 
> > > Seems like this would potentially better addressed by reporting a bug to the
> > > cirrus folks?
> > 
> > You said that before, but I don't think so - since they wrote code to do
> > that, it's odd to file a bug that says that the behavior is wrong.  I am
> > curious why, but it seems delibrate.
> > 
> > https://www.postgresql.org/message-id/20220828171029.GO2342%40telsasoft.com
> 
> I suspect this is just about dealing with unnamed tasks and could be
> handled by just mixing in CI_NODE_INDEX if the task name isn't set.

I suppose it was their way of dealing with this:

|Cache artifacts are shared between tasks, so two caches with the same
|name on e.g. Linux containers and macOS VMs will share the same set of
|files. This may introduce binary incompatibility between caches. To
|avoid that, add echo $CIRRUS_OS into fingerprint_script or use
|$CIRRUS_OS in fingerprint_key, which will distinguish caches based on
|OS.

To make caches work automatically, without having to know to name them
differently.

-- 
Justin

Re: CI and test improvements

From

Justin Pryzby

Date:

04 January 2023, 23:44:24

On Tue, Nov 22, 2022 at 04:57:44PM -0600, Justin Pryzby wrote:
> I shuffled my branch around and sending now the current "docs" patches,
> but I suppose this is waiting on the "convert CompilerWarnings task to
> meson" patch.

In case it's not, here's a version to do that now.

Attachment

Re: CI and test improvements

From

Justin Pryzby

Date:

17 January 2023, 17:35:09

The autoconf system runs all tap tests in t/*.pl, but meson requires
enumerating them in ./meson.build.

This checks for and finds no missing tests in the current tree:

$ for pl in `find src contrib -path '*/t/*.pl'`; do base=${pl##*/}; dir=${pl%/*}; meson=${dir%/*}/meson.build; grep
"$base""$meson" >/dev/null || echo "$base is missing from $meson"; done
 

However, this finds two real problems and one false-positive with
missing regress/isolation tests:

$ for makefile in `find src contrib -name Makefile`; do for testname in `sed -r '/^(REGRESS|ISOLATION) =/!d; s///; :l;
/\\\\$/{s///;N; b l}; s/\n//g' "$makefile"`; do meson=${makefile%/Makefile}/meson.build; grep -Fw "$testname" "$meson"
>/dev/null|| echo "$testname is missing from $meson"; done; done
 
guc_privs is missing from src/test/modules/unsafe_tests/meson.build
oldextversions is missing from contrib/pg_stat_statements/meson.build
$(CF_PGP_TESTS) is missing from contrib/pgcrypto/meson.build

I also tried but failed to write something to warn if "meson test" was
run with a list of tests but without tmp_install.  Help wanted.

I propose to put something like this into "SanityCheck".

-- 
Justin

Re: CI and test improvements

From

Andres Freund

Date:

17 January 2023, 19:56:42

Hi,

On 2023-01-17 11:35:09 -0600, Justin Pryzby wrote:
> The autoconf system runs all tap tests in t/*.pl, but meson requires
> enumerating them in ./meson.build.

Yes. It was a mistake that we ever used t/*.pl for make. For one, it means
that make can't control concurrency meaningfully, due to the varying number of
tests run with one prove instance. It's also the only thing that tied us to
prove, which is one hell of a buggy mess.

> This checks for and finds no missing tests in the current tree:
>
> $ for pl in `find src contrib -path '*/t/*.pl'`; do base=${pl##*/}; dir=${pl%/*}; meson=${dir%/*}/meson.build; grep
"$base""$meson" >/dev/null || echo "$base is missing from $meson"; done

Likely because I do something similar locally.

# prep
m test --list > /tmp/tests.txt

# Check if all tap tests are known to meson
for f in $(git ls-files|grep -E '(t|test)/.*.pl$'|sort);do t=$(echo $f|sed -E -e
's/^.*\/([^/]*)\/(t|test)\/(.*)\.pl$/\1\/\3/');grep-q -L $t /tmp/tests.txt |\

| echo $f;done

# Check if all regression / isolation tests are known to meson
#
# Expected to find plpgsql due to extra 'src' directory level, src/test/mb
# because it's not run anywhere and sepgsql, because that's not tested yet
for d in $(find ~/src/postgresql -type d \( -name sql -or -name specs \) );do t=$(basename $(dirname $d)); grep -q -L
$t/tmp/tests.txt || echo $d; done

> However, this finds two real problems and one false-positive with
> missing regress/isolation tests:

Which the above does *not* test for. Good catch.

I'll push the fix for those as soon as tests passed on my personal repo.

> $ for makefile in `find src contrib -name Makefile`; do for testname in `sed -r '/^(REGRESS|ISOLATION) =/!d; s///;
:l;/\\\\$/{s///; N; b l}; s/\n//g' "$makefile"`; do meson=${makefile%/Makefile}/meson.build; grep -Fw "$testname"
"$meson">/dev/null || echo "$testname is missing from $meson"; done; done

> guc_privs is missing from src/test/modules/unsafe_tests/meson.build

Yep. That got added during the development of the meson port, so it's not too surprising.

> oldextversions is missing from contrib/pg_stat_statements/meson.build

This one however, is odd. Not sure how that happened.

> $(CF_PGP_TESTS) is missing from contrib/pgcrypto/meson.build

Assume that's the false positive?

> I also tried but failed to write something to warn if "meson test" was
> run with a list of tests but without tmp_install.  Help wanted.

That doesn't even catch the worst case - when there's tmp_install, but it's
too old.

The proper solution would be to make the creation of tmp_install a dependency
of the relevant tests. Unfortunately meson still auto-propages those to
dependencies of the 'all' target (for historical reasons), and creating the
temp install is too slow on some machines to make that tolerable. I think
there's an open PR to change that. Once that's in a released meson version
that's in somewhat widespread use, we should change that.

The other path forward is to allow running the tests without
tmp_install. There's not that much we'd need to allow running directly from
the source tree - the biggest thing is a way to load extensions from a list of
paths. This option is especially attractive because it'd allow to run
individual tests without a fully built sourcetree. No need to build other
binaries when you just want to test psql, or more extremely, pg_test_timing.

> I propose to put something like this into "SanityCheck".

Perhaps we instead could add it as a separate "meson-only" test? Then it'd
fail on developer's machines, instead of later in CI. We could pass the test
information from the 'tests' array, or it could look at the metadata in
meson-info/intro-tests.json

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

18 January 2023, 01:10:18

On Tue, Jan 17, 2023 at 11:56:42AM -0800, Andres Freund wrote:
> > $(CF_PGP_TESTS) is missing from contrib/pgcrypto/meson.build
> 
> Assume that's the false positive?

Yes

> > I also tried but failed to write something to warn if "meson test" was
> > run with a list of tests but without tmp_install.  Help wanted.
> 
> That doesn't even catch the worst case - when there's tmp_install, but it's
> too old.

I don't understand what you mean by "too old" ?

> > I propose to put something like this into "SanityCheck".
> 
> Perhaps we instead could add it as a separate "meson-only" test? Then it'd
> fail on developer's machines, instead of later in CI. We could pass the test
> information from the 'tests' array, or it could look at the metadata in
> meson-info/intro-tests.json

I guess you mean that it should be *able* to fail on developer machines
*in addition* to cirrusci.

But, a meson-only test might not be so helpful, as it assumes that the
developer is using meson, in which case the problem would tend not to
have occured in the first place.

BTW I also noticed that:

meson.build:meson_binpath_r = run_command(python, 'src/tools/find_meson', check: true)
meson.build-
meson.build-if meson_binpath_r.returncode() != 0 or meson_binpath_r.stdout() == ''
meson.build-  error('huh, could not run find_meson.\nerrcode: @0@\nstdout: @1@\nstderr: @2@'.format(

The return code will never be nonzero since check==true, right ?

-- 
Justin

Re: CI and test improvements

From

Thomas Munro

Date:

02 February 2023, 01:02:25

On Fri, Dec 30, 2022 at 4:59 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Nov 23, 2022 at 11:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > [PATCH 03/10] cirrus/macos: update to macos ventura
>
> I don't know any reason not to push this one too, but it's not time critical.

Some observations:

* macOS has a new release every year in June[1]
* updates cease after three years[1]
* thus three releases are in support (by that definition) at a time
* we need an image on Cirrus; 13 appeared ~1 month later[2]
* we need Homebrew support; 13 appeared ~3 months later[3]
* we have 13 and 12 in the buildfarm, but no 11
* it's common for developers but uncommon for servers/deployment

So what should our policy be on when to roll the CI image forward?  I
guess around New Year/now (~6 months after release) is a good time and
we should just do it.  Anyone got a reason why we should wait?  Our
other CI OSes have slower major version release cycles and longer
lives, so it's not quite the same hamster wheel of upgrades.

[1] https://en.wikipedia.org/wiki/MacOS_version_history#Releases
[2] https://github.com/orgs/cirruslabs/packages?tab=packages&q=macos
[3] https://brew.sh/2022/09/07/homebrew-3.6.0/

Re: CI and test improvements

From

Tom Lane

Date:

02 February 2023, 01:12:12

Thomas Munro <thomas.munro@gmail.com> writes:
> Some observations:

> * macOS has a new release every year in June[1]
> * updates cease after three years[1]
> * thus three releases are in support (by that definition) at a time
> * we need an image on Cirrus; 13 appeared ~1 month later[2]
> * we need Homebrew support; 13 appeared ~3 months later[3]
> * we have 13 and 12 in the buildfarm, but no 11
> * it's common for developers but uncommon for servers/deployment

> So what should our policy be on when to roll the CI image forward?  I
> guess around New Year/now (~6 months after release) is a good time and
> we should just do it.  Anyone got a reason why we should wait?  Our
> other CI OSes have slower major version release cycles and longer
> lives, so it's not quite the same hamster wheel of upgrades.

I'd argue that developers are probably the kind of people who update
their OS sooner rather than later --- I've usually updated my laptop
and at least one BF animal to $latest macOS within a month or so of
the dot-zero release.  So waiting 6 months seems to me like CI will be
behind the users, which will be unhelpful.  I'd rather drop the oldest
release sooner, if we need to hold down the number of macOS revisions
under test.

            regards, tom lane

Re: CI and test improvements

From

Thomas Munro

Date:

03 February 2023, 01:58:04

On Thu, Feb 2, 2023 at 2:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Some observations:
> > So what should our policy be on when to roll the CI image forward?  I
> > guess around New Year/now (~6 months after release) is a good time and
> > we should just do it.  Anyone got a reason why we should wait?  Our
> > other CI OSes have slower major version release cycles and longer
> > lives, so it's not quite the same hamster wheel of upgrades.
>
> I'd argue that developers are probably the kind of people who update
> their OS sooner rather than later --- I've usually updated my laptop
> and at least one BF animal to $latest macOS within a month or so of
> the dot-zero release.  So waiting 6 months seems to me like CI will be
> behind the users, which will be unhelpful.  I'd rather drop the oldest
> release sooner, if we need to hold down the number of macOS revisions
> under test.

Cool.  Done.

Out of curiosity, I wondered how the "graphical installer" packagers
like EDB and Postgres.app choose a target, when Apple is moving so
fast.  I see that the current EDB installers target 10.14 for PG15,
which was 5 years old at initial release, and thus already EOL'd for 2
years.  Postgres.app goes back one more year.  In other words, even
though that preadv/pwritev "decl" stuff is unnecessary for PG16 if you
think we should only target OSes that the vendor still supports (which
will be 12, 13, 14), someone would still shout at me if I removed it.

Re: CI and test improvements

From

Justin Pryzby

Date:

03 February 2023, 14:26:57

rebased, and re-including a patch to show code coverage of changed
files.

a5b3e50d922 cirrus/windows: add compiler_warnings_script
4c98dcb0e03 cirrus/freebsd: run with more CPUs+RAM and do not repartition
aaeef938ed4 cirrus/freebsd: define ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
9baf41674ad pg_upgrade: tap test: exercise --link and --clone
7e09035f588 WIP: ci/meson: allow showing only failed tests ..
e4534821ef5 cirrus/ccache: use G rather than GB suffix..
185d1c3ed13 cirrus: code coverage
5dace84a038 cirrus: upload changed html docs as artifacts
852360330ef +html index file

Attachment

Re: CI and test improvements

From

Peter Eisentraut

Date:

13 March 2023, 06:39:52

On 03.02.23 15:26, Justin Pryzby wrote:
> rebased, and re-including a patch to show code coverage of changed
> files.

This constant flow of patches under one subject doesn't lend itself well 
to the commit fest model of trying to finish things up.  I can't quite 
tell which of these patches are ready and agreed upon, and which ones 
are work in progress or experimental.

 > e4534821ef5 cirrus/ccache: use G rather than GB suffix..

This one seems obvious.  I have committed it.

> 9baf41674ad pg_upgrade: tap test: exercise --link and --clone

This seems like a good idea.

> 7e09035f588 WIP: ci/meson: allow showing only failed tests ..

I'm not sure I like this one.  I sometimes look up the logs of 
non-failed tests to compare them with failed tests, to get context to 
could lead to failures.  Maybe we can make this behavior adjustable. 
But I've not been bothered by the current behavior.

Re: CI and test improvements

From

Justin Pryzby

Date:

14 March 2023, 04:56:56

On Mon, Mar 13, 2023 at 07:39:52AM +0100, Peter Eisentraut wrote:
> On 03.02.23 15:26, Justin Pryzby wrote:
> > rebased, and re-including a patch to show code coverage of changed
> > files.
> 
> This constant flow of patches under one subject doesn't lend itself well to
> the commit fest model of trying to finish things up.
> I can't quite tell which of these patches are ready and agreed upon,
> and which ones are work in progress or experimental.

I'm soliticing feedback on those patches that I've sent recently - I've
elided patches if they have some unresolved issue.

I'm not aware of any loose ends other than what's updated here:

- cirrus: code coverage

I changed this to also run an "initial" coverage report before running
tests.  It's not clear to me what effect that has, though...

Andres seems to think it's a problem that this shows coverage only for
files that were actually changed.  But that's what's intended; it's
sufficient to see if new code is being hit by tests.  It would be slow
and take a lot of extra space to upload a coverage report for every
patch, every day.  It might be nice for cfbot to show how test coverage
changed in the affected files: -15% / +25%.

- cirrus: upload changed html docs as artifacts

Fixed an "only_if" line so cfbot will run the "warnings" task.

Maybe this path is waiting on Andres' patch to "move CompilerWarnings to
meson" ?

> > 7e09035f588 WIP: ci/meson: allow showing only failed tests ..
> 
> I'm not sure I like this one.  I sometimes look up the logs of non-failed
> tests to compare them with failed tests, to get context to could lead to
> failures.  Maybe we can make this behavior adjustable. But I've not been
> bothered by the current behavior.

It's adjustable by un/setting the environment variable.

I'm surprised to hear that anyone using cirrusci (with or without cfbot)
wouldn't prefer the behavior this patch implements.  It's annoying to
search find the logs for the (typically exactly one) failing test in
cirrus' directory of 200some test artifacts.  We're also uploading a lot
of logs for every failure.  (But I suppose this might break cfbot's new
client side parsing of things like build logs...)

-- 
Justin

Attachment

Re: CI and test improvements

From

Peter Eisentraut

Date:

15 March 2023, 09:58:41

On 14.03.23 05:56, Justin Pryzby wrote:
> I'm soliticing feedback on those patches that I've sent recently - I've
> elided patches if they have some unresolved issue.

 > [PATCH 1/8] cirrus/windows: add compiler_warnings_script

Needs a better description of what it actually does.  (And fewer "I'm 
not sure how to write this ..." comments ;-) )  It looks like it would 
fail the build if there is a compiler warning in the Windows VS task? 
Shouldn't that be done in the CompilerWarnings task?

Also, I see a bunch of warnings in the current output from that task. 
These should be cleaned up in any case before we can let a thing like 
this loose.

(The warnings are all like

C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': macro 
redefinition

so possibly a single fix can address them all.)

 > [PATCH 2/8] cirrus/freebsd: run with more CPUs+RAM and do not repartition

I don't know enough about this.  Maybe Andres or Thomas want to take 
this.  No concerns if it's safe.

 > [PATCH 3/8] cirrus/freebsd: define 
ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS

Looks sensible.

 > [PATCH 4/8] pg_upgrade: tap test: exercise --link and --clone

I haven't been able to get any changes to the test run times outside of 
noise from this.  But some more coverage is sensible in any case.

I'm concerned that with this change, the only platform that tests --copy 
is Windows, but Windows has a separate code path for copy.  So we should 
leave one Unix platform to test --copy.  Maybe have FreeBSD test --link 
and macOS test --clone and leave the others with --copy?

 > [PATCH 5/8] WIP: ci/meson: allow showing only failed tests ..

>>> 7e09035f588 WIP: ci/meson: allow showing only failed tests ..
>>
>> I'm not sure I like this one.  I sometimes look up the logs of non-failed
>> tests to compare them with failed tests, to get context to could lead to
>> failures.  Maybe we can make this behavior adjustable. But I've not been
>> bothered by the current behavior.
> 
> It's adjustable by un/setting the environment variable.
> 
> I'm surprised to hear that anyone using cirrusci (with or without cfbot)
> wouldn't prefer the behavior this patch implements.  It's annoying to
> search find the logs for the (typically exactly one) failing test in
> cirrus' directory of 200some test artifacts.  We're also uploading a lot
> of logs for every failure.  (But I suppose this might break cfbot's new
> client side parsing of things like build logs...)

One thing that actually annoys me that is that a successful run does not 
upload any test artifacts at all.  So, I guess I'm just of a different 
opinion here.

 > [PATCH 6/8] cirrus: code coverage

This adds -Db_coverage=true to the FreeBSD task.  This has a significant 
impact on the build time.  (+50% at least, it appears.)

I'm not sure the approach here makes sense.  For example, if you add a 
new test, the set of changed files is just that test.  So you won't get 
any report on what coverage change the test has caused.

Also, I don't think I trust the numbers from the meson coverage stuff 
yet.  See for example 
<https://www.postgresql.org/message-id/Y/3AI+/MqKcjLk/T@paquier.xyz>.

 > [PATCH 7/8] cirrus: upload changed html docs as artifacts
 > [PATCH 8/8] +html index file

This builds the docs twice and then analyzes the differences between the 
two builds.  This also affects the build times quite significantly.

How useful is this actually?  People who want to look at the docs can 
build them locally.  There are no platform dependencies or anything like 
that where having them built elsewhere is of advantage.

Re: CI and test improvements

From

Justin Pryzby

Date:

15 March 2023, 14:56:12

On Wed, Mar 15, 2023 at 10:58:41AM +0100, Peter Eisentraut wrote:
> On 14.03.23 05:56, Justin Pryzby wrote:
> > I'm soliticing feedback on those patches that I've sent recently - I've
> > elided patches if they have some unresolved issue.
> 
> > [PATCH 1/8] cirrus/windows: add compiler_warnings_script
> 
> Needs a better description of what it actually does.  (And fewer "I'm not
> sure how to write this ..." comments ;-) )  It looks like it would fail the
> build if there is a compiler warning in the Windows VS task? Shouldn't that
> be done in the CompilerWarnings task?

The goal is to fail due to warnings only after running tests.

https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de
"Probably worth scripting something to make the windows task error out
if there had been warnings, but only after running the tests."

CompilerWarnings runs in a linux environment running with -Werror.  
This patch scrapes warnings out of MSVC, since (at least historically)
it's too slow to run a separate windows VM to compile with -Werror.

> Also, I see a bunch of warnings in the current output from that task. These
> should be cleaned up in any case before we can let a thing like this loose.

Yeah (and I mentioned those myself).  As it stands, my patch also
"breaks" everytime someone's else's patch introduces warnings.  I
included links demonstrating its failures.

I agree that it's not okay to merge the patch when it's currently
failing, but I cannot dig into that other issue right now.

> > [PATCH 6/8] cirrus: code coverage
> 
> This adds -Db_coverage=true to the FreeBSD task.  This has a significant
> impact on the build time.  (+50% at least, it appears.)

Yes - but with the CPUs added by the prior patch, the freebsd task is
faster than it is currently.  And its 8min runtime would match the other
tasks well.

> I'm not sure the approach here makes sense.  For example, if you add a new
> test, the set of changed files is just that test.  So you won't get any
> report on what coverage change the test has caused.

The coverage report that I proposed clearly doesn't handle that case -
it's not intended to.

Showing a full coverage report is somewhat slow to generate, probably
unreasonable to upload for every patch, every day, and not very
interesting since it's at least 99% duplicative.  The goal is to show a
coverage report for new code for every patch.  What fraction of the time
do you think the patch author, reviewer or committer have looked at a
coverage report?  It's not a question of whether it's possible to do so
locally, but of whether it's actually done.

> Also, I don't think I trust the numbers from the meson coverage stuff yet.
> See for example
> <https://www.postgresql.org/message-id/Y/3AI+/MqKcjLk/T@paquier.xyz>.

I'm not using the meson coverage target.  I could instead add
CFLAGS=--coverage.  Anyway, getting a scalar value like "83%" might be
interesting to show in cfbot, but it's not the main goal here.

> > [PATCH 7/8] cirrus: upload changed html docs as artifacts
> > [PATCH 8/8] +html index file
> 
> This builds the docs twice and then analyzes the differences between the two
> builds.  This also affects the build times quite significantly.

The main goal is to upload the changed docs.

> People who want to look at the docs can build them locally.

This makes the docs for every patch available for reviewers, without
needing a build environment.  An easy goal would be if documentation for
every patch was reviewed by a native english speaker.  Right now that's
not consistently true.

> How useful is this actually?

I'm surprised if there's any question about the merits of making
documentation easily available for review.  Several people have agreed;
one person mailed me privately specifically to ask how to show HTML docs
on cirrusci.

Anyway, all this stuff is best addressed either before or after the CF.
I'll kick the patch forward.  Thanks for looking.

-- 
Justin

Re: CI and test improvements

From

Peter Eisentraut

Date:

15 March 2023, 15:57:34

On 15.03.23 15:56, Justin Pryzby wrote:
> I'm surprised if there's any question about the merits of making
> documentation easily available for review.  Several people have agreed;
> one person mailed me privately specifically to ask how to show HTML docs
> on cirrusci.
> 
> Anyway, all this stuff is best addressed either before or after the CF.
> I'll kick the patch forward.  Thanks for looking.

I suppose this depends on what you want to use this for.  If your use is 
to prepare and lay out as much information as possible about a patch for 
a reviewer, some of your ideas make sense.

I'm using this primarily to quickly test local work in progress.  So I 
want a quick feedback cycle.  I don't need it to show me which HTML docs 
changed, for example.

So maybe there need to be different modes.

Re: CI and test improvements

From

Justin Pryzby

Date:

12 April 2023, 01:05:21

On Wed, Mar 15, 2023 at 04:57:34PM +0100, Peter Eisentraut wrote:
> On 15.03.23 15:56, Justin Pryzby wrote:
> > I'm surprised if there's any question about the merits of making
> > documentation easily available for review.  Several people have agreed;
> > one person mailed me privately specifically to ask how to show HTML docs
> > on cirrusci.
> > 
> > Anyway, all this stuff is best addressed either before or after the CF.
> > I'll kick the patch forward.  Thanks for looking.
> 
> I suppose this depends on what you want to use this for.  If your use is to
> prepare and lay out as much information as possible about a patch for a
> reviewer, some of your ideas make sense.
> 
> I'm using this primarily to quickly test local work in progress.  So I want
> a quick feedback cycle.  I don't need it to show me which HTML docs changed,
> for example.
> 
> So maybe there need to be different modes.

I'm opened to that - for example, mingw is currently opt-in.  Maybe this
should be a separate task - it was implemented like that based on an
earlier suggestion (and then changed back again based on another
suggestion).  The task could be triggered manually or by cfbot's
message.

But a primary goal for cirrus.yml was to allow developers to do the same
things as cfbot, and without everyone needing to reimplement it for
themselves.

You want quick feedback, like everyone else - but I doubt you disable
the documentation build when you don't need it, even though that would
shave off a whole minute.  And I doubt that you'd comment it out even
the documentation was built twice.

Anyway - I think this patch is probably waiting on Andres' patch to
"convert CompilerWarnings to meson".

> > 7e09035f588 WIP: ci/meson: allow showing only failed tests ..
> 
> I'm not sure I like this one.  I sometimes look up the logs of non-failed
> tests to compare them with failed tests, to get context to could lead to
> failures.  Maybe we can make this behavior adjustable. But I've not been
> bothered by the current behavior.

I suggest to try the patch; I doubt you'd prefer the existing behavior.

The patch is rebased now that meson is updated to avoid the windows
python warnings (thanks Andres).

-- 
Justin

Attachment

Re: CI and test improvements

From

Peter Eisentraut

Date:

03 July 2023, 08:37:43

On 12.04.23 03:05, Justin Pryzby wrote:
> The patch is rebased now that meson is updated to avoid the windows
> python warnings (thanks Andres).

To keep this moving along, I have committed

[PATCH 3/8] cirrus/freebsd: define ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS

Re: CI and test improvements

From

Justin Pryzby

Date:

12 July 2023, 05:56:17

On Tue, Jan 17, 2023 at 11:35:09AM -0600, Justin Pryzby wrote:
> However, this finds two real problems and one false-positive with
> missing regress/isolation tests:
> 
> $ for makefile in `find src contrib -name Makefile`; do for testname in `sed -r '/^(REGRESS|ISOLATION) =/!d; s///;
:l;/\\\\$/{s///; N; b l}; s/\n//g' "$makefile"`; do meson=${makefile%/Makefile}/meson.build; grep -Fw "$testname"
"$meson">/dev/null || echo "$testname is missing from $meson"; done; done
 

And, since 681d9e462:

security is missing from contrib/seg/meson.build

Re: CI and test improvements

From

Michael Paquier

Date:

12 July 2023, 06:07:53

On Wed, Jul 12, 2023 at 12:56:17AM -0500, Justin Pryzby wrote:
> And, since 681d9e462:
>
> security is missing from contrib/seg/meson.build

Ugh.  Good catch!
--
Michael

Attachment

signature.asc

Re: CI and test improvements

From

vignesh C

Date:

17 January 2024, 12:04:00

On Wed, 12 Jul 2023 at 11:38, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jul 12, 2023 at 12:56:17AM -0500, Justin Pryzby wrote:
> > And, since 681d9e462:
> >
> > security is missing from contrib/seg/meson.build
>
> Ugh.  Good catch!

Are we planning to do anything more in this thread, the thread has
been idle for more than 7 months. If nothing more is planned for this,
I'm planning to close this commitfest entry in this commitfest.

Regards,
Vignesh

Re: CI and test improvements

From

Michael Paquier

Date:

18 January 2024, 01:16:00

On Wed, Jan 17, 2024 at 05:34:00PM +0530, vignesh C wrote:
> Are we planning to do anything more in this thread, the thread has
> been idle for more than 7 months. If nothing more is planned for this,
> I'm planning to close this commitfest entry in this commitfest.

Oops, this went through the cracks.  security was still missing in
seg's meson.build, so I've just applied a patch to take care of it.
I am not spotting any other holes..
--
Michael

Attachment

signature.asc

Re: CI and test improvements

From

vignesh C

Date:

31 January 2024, 09:40:13

On Thu, 18 Jan 2024 at 06:46, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jan 17, 2024 at 05:34:00PM +0530, vignesh C wrote:
> > Are we planning to do anything more in this thread, the thread has
> > been idle for more than 7 months. If nothing more is planned for this,
> > I'm planning to close this commitfest entry in this commitfest.
>
> Oops, this went through the cracks.  security was still missing in
> seg's meson.build, so I've just applied a patch to take care of it.
> I am not spotting any other holes..

Are we planning to do anything more on this? I was not sure if we
should move this to next commitfest or return it.

Regards,
Vignesh

Re: CI and test improvements

From

Alvaro Herrera

Date:

31 January 2024, 10:59:21

On 2024-Jan-31, vignesh C wrote:

> Are we planning to do anything more on this? I was not sure if we
> should move this to next commitfest or return it.

Well, the patches don't apply anymore since .cirrus.tasks.yml has been
created.  However, I'm sure we still want [some of] the improvements
to the tests in [1].  I can volunteer to rebase the patches in time for the
March commitfest, if Justin is not available to do so.  If you can
please move it forward to the March cf and set it WoA, I'd appreciate
that.

Thanks

[1] https://postgr.es/m/ZA/+mKDX9zWfhD3v@telsasoft.com

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Tiene valor aquel que admite que es un cobarde" (Fernandel)

Re: CI and test improvements

From

Justin Pryzby

Date:

13 February 2024, 19:10:20

On Wed, Jan 31, 2024 at 11:59:21AM +0100, Alvaro Herrera wrote:
> On 2024-Jan-31, vignesh C wrote:
> 
> > Are we planning to do anything more on this? I was not sure if we
> > should move this to next commitfest or return it.
> 
> Well, the patches don't apply anymore since .cirrus.tasks.yml has been
> created.  However, I'm sure we still want [some of] the improvements
> to the tests in [1].  I can volunteer to rebase the patches in time for the
> March commitfest, if Justin is not available to do so.  If you can
> please move it forward to the March cf and set it WoA, I'd appreciate
> that.

The patches are rebased.  A couple were merged since I last rebased them
~10 months ago.  The freebsd patch will probably be obsoleted by a patch
of Thomas.

On Mon, Mar 13, 2023 at 07:39:52AM +0100, Peter Eisentraut wrote:
> On 03.02.23 15:26, Justin Pryzby wrote:
> > 9baf41674ad pg_upgrade: tap test: exercise --link and --clone
> 
> This seems like a good idea.

On Wed, Mar 15, 2023 at 10:58:41AM +0100, Peter Eisentraut wrote:
> > [PATCH 4/8] pg_upgrade: tap test: exercise --link and --clone
> 
> I haven't been able to get any changes to the test run times outside
> of noise from this.  But some more coverage is sensible in any case.
> 
> I'm concerned that with this change, the only platform that tests
> --copy is Windows, but Windows has a separate code path for copy.  So
> we should leave one Unix platform to test --copy.  Maybe have FreeBSD
> test --link and macOS test --clone and leave the others with --copy?

I addressed Peter's comments, but haven't heard further.

The patch to show HTML docs artifacts may be waiting for Andres' patch
to convert CompilerWarnings to meson.

It may also be waiting on cfbot to avoid squishing all the patches
together.

I sent various patches to cfbot but haven't heard back.
https://www.postgresql.org/message-id/flat/20220409021853.GP24419@telsasoft.com
https://www.postgresql.org/message-id/flat/20220623193125.GB22452@telsasoft.com
https://github.com/justinpryzby/cfbot/commits/master
https://github.com/macdice/cfbot/pulls

-- 
Justin

Attachment

Re: CI and test improvements

From

Peter Eisentraut

Date:

19 February 2024, 08:33:54

On 13.02.24 20:10, Justin Pryzby wrote:
> On Mon, Mar 13, 2023 at 07:39:52AM +0100, Peter Eisentraut wrote:
>> On 03.02.23 15:26, Justin Pryzby wrote:
>>> 9baf41674ad pg_upgrade: tap test: exercise --link and --clone
>> This seems like a good idea.
> On Wed, Mar 15, 2023 at 10:58:41AM +0100, Peter Eisentraut wrote:
>>> [PATCH 4/8] pg_upgrade: tap test: exercise --link and --clone
>> I haven't been able to get any changes to the test run times outside
>> of noise from this.  But some more coverage is sensible in any case.
>>
>> I'm concerned that with this change, the only platform that tests
>> --copy is Windows, but Windows has a separate code path for copy.  So
>> we should leave one Unix platform to test --copy.  Maybe have FreeBSD
>> test --link and macOS test --clone and leave the others with --copy?
> I addressed Peter's comments, but haven't heard further.

Ok, I didn't see that my feedback had been addressed.  I have committed 
this patch.

Re: CI and test improvements

From

"Andrey M. Borodin"

Date:

08 April 2024, 14:54:10

> On 19 Feb 2024, at 11:33, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> Ok, I didn't see that my feedback had been addressed.  I have committed this patch.

Justin, Peter, I can't determine actual status of the CF entry [0]. May I ask someone of you to move patch to next CF
orclose as committed? 
Thanks!

Best regards, Andrey Borodin.
[0] https://commitfest.postgresql.org/47/3709/

Re: CI and test improvements

From

Michael Paquier

Date:

11 April 2024, 01:12:10

On Mon, Apr 08, 2024 at 05:54:10PM +0300, Andrey M. Borodin wrote:
> Justin, Peter, I can't determine actual status of the CF entry
> [0]. May I ask someone of you to move patch to next CF or close as
> committed?

0002 is the only thing committed as of 21a71648d39f.

I can see the value in 0001, but the implementation feels awkward.

0003 is wanted.

I am personally not sure about 0004 to upload doc artifacts.
Similarly.

0005 can already be done with a few clicks on the CI, and the previous
run may not be the only one that matters.

0006 makes the doc check phase more complex.

In all that, 0003 is something that we should move on with, at least.

Moving this entry to the next CF makes sense to me now, to give more
time to the other patches, and there's value to be extracted at quick
glance.
--
Michael

Attachment

signature.asc

Re: CI and test improvements

From

Justin Pryzby

Date:

12 June 2024, 13:10:23

On Fri, Jun 24, 2022 at 08:38:50AM +1200, Thomas Munro wrote:
> > I've also sent some patches to Thomas for cfbot to help progress some of these
> > patches (code coverage and documentation upload as artifacts).
> > https://github.com/justinpryzby/cfbot/commits/master
> 
> Thanks, sorry for lack of action, will get to these soon.

On Tue, Feb 13, 2024 at 01:10:21PM -0600, Justin Pryzby wrote:
> I sent various patches to cfbot but haven't heard back.

> https://www.postgresql.org/message-id/flat/20220409021853.GP24419@telsasoft.com
> https://www.postgresql.org/message-id/flat/20220623193125.GB22452@telsasoft.com
> https://github.com/justinpryzby/cfbot/commits/master
> https://github.com/macdice/cfbot/pulls

@Thomas: ping

I reintroduced the patch for ccache/windows -- v4.10 supports PCH, which
can make the builds 2x faster.

Attachment

Re: CI and test improvements

From

Nazir Bilal Yavuz

Date:

13 June 2024, 11:38:46

Hi,

Thanks for working on this!

On Wed, 12 Jun 2024 at 16:10, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Fri, Jun 24, 2022 at 08:38:50AM +1200, Thomas Munro wrote:
> > > I've also sent some patches to Thomas for cfbot to help progress some of these
> > > patches (code coverage and documentation upload as artifacts).
> > > https://github.com/justinpryzby/cfbot/commits/master
> >
> > Thanks, sorry for lack of action, will get to these soon.
>
> On Tue, Feb 13, 2024 at 01:10:21PM -0600, Justin Pryzby wrote:
> > I sent various patches to cfbot but haven't heard back.
>
> > https://www.postgresql.org/message-id/flat/20220409021853.GP24419@telsasoft.com
> > https://www.postgresql.org/message-id/flat/20220623193125.GB22452@telsasoft.com
> > https://github.com/justinpryzby/cfbot/commits/master
> > https://github.com/macdice/cfbot/pulls
>
> @Thomas: ping
>
> I reintroduced the patch for ccache/windows -- v4.10 supports PCH, which
> can make the builds 2x faster.

I applied 0001 and 0002 to see ccache support on Windows but the build
step failed with: 'ccache: error: No stats log has been configured'.
Perhaps you forgot to add 'CCACHE_STATSLOG: $CCACHE_DIR.stats.log' to
0002? After adding that line, CI finished successfully. And, I confirm
that the build step takes ~30 seconds now; it was ~90 seconds before
that.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: CI and test improvements

From

Justin Pryzby

Date:

13 June 2024, 11:56:20

On Thu, Jun 13, 2024 at 02:38:46PM +0300, Nazir Bilal Yavuz wrote:
> On Wed, 12 Jun 2024 at 16:10, Justin Pryzby <pryzby@telsasoft.com> wrote:
> > On Fri, Jun 24, 2022 at 08:38:50AM +1200, Thomas Munro wrote:
> > > > I've also sent some patches to Thomas for cfbot to help progress some of these
> > > > patches (code coverage and documentation upload as artifacts).
> > > > https://github.com/justinpryzby/cfbot/commits/master
> > >
> > > Thanks, sorry for lack of action, will get to these soon.
> >
> > On Tue, Feb 13, 2024 at 01:10:21PM -0600, Justin Pryzby wrote:
> > > I sent various patches to cfbot but haven't heard back.
> >
> > > https://www.postgresql.org/message-id/flat/20220409021853.GP24419@telsasoft.com
> > > https://www.postgresql.org/message-id/flat/20220623193125.GB22452@telsasoft.com
> > > https://github.com/justinpryzby/cfbot/commits/master
> > > https://github.com/macdice/cfbot/pulls
> >
> > @Thomas: ping
> >
> > I reintroduced the patch for ccache/windows -- v4.10 supports PCH, which
> > can make the builds 2x faster.
> 
> I applied 0001 and 0002 to see ccache support on Windows but the build
> step failed with: 'ccache: error: No stats log has been configured'.
> Perhaps you forgot to add 'CCACHE_STATSLOG: $CCACHE_DIR.stats.log' to
> 0002?

Something like that - I put the line back.  I don't know if statslog
should be included in the patch, but it's useful for demonstrating that
it's working.

ccache should be installed in the image rather than re-installed on each
invocation.

-- 
Justin

Attachment

Re: CI and test improvements

From

Michael Paquier

Date:

13 June 2024, 23:05:09

On Thu, Jun 13, 2024 at 06:56:20AM -0500, Justin Pryzby wrote:
> On Thu, Jun 13, 2024 at 02:38:46PM +0300, Nazir Bilal Yavuz wrote:
>>> I reintroduced the patch for ccache/windows -- v4.10 supports PCH, which
>>> can make the builds 2x faster.
>>
>> I applied 0001 and 0002 to see ccache support on Windows but the build
>> step failed with: 'ccache: error: No stats log has been configured'.
>> Perhaps you forgot to add 'CCACHE_STATSLOG: $CCACHE_DIR.stats.log' to
>> 0002?
>
> Something like that - I put the line back.  I don't know if statslog
> should be included in the patch, but it's useful for demonstrating that
> it's working.
>
> ccache should be installed in the image rather than re-installed on each
> invocation.

Getting a 90s -> 30s improvement is nice.  With such numbers, 0002 is
worth considering first.

+    ninja -C build |tee build.txt

In 0001, how OK is it to rely on the existence of tee for the VS2019
environments?  The base images include it, meaning that it is OK?

-    REM choco install -y --no-progress ...

I'd rather keep this line in 0002, as a matter of documentation.

+    set CC=c:\ProgramData\chocolatey\lib\ccache\tools\ccache-4.10-windows-x86_64\ccache.exe cl.exe

As of https://docs.mesa3d.org/meson.html#compiler-specification, using
CC is supported by meson (didn't know that), but shouldn't this be set
in the "env:" part of the VS2019 task in .cirrus.tasks.yml?
--
Michael

Attachment

signature.asc

Re: CI and test improvements

From

Nazir Bilal Yavuz

Date:

14 June 2024, 14:36:54

Hi,

On Thu, 13 Jun 2024 at 14:56, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> ccache should be installed in the image rather than re-installed on each
> invocation.

ccache is installed in the Windows VM images now [1]. It can be used
as 'set CC=ccache.exe cl.exe' in the Windows CI task.

[1] https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: CI and test improvements

From

Justin Pryzby

Date:

14 June 2024, 15:22:01

On Fri, Jun 14, 2024 at 05:36:54PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Thu, 13 Jun 2024 at 14:56, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > ccache should be installed in the image rather than re-installed on each
> > invocation.
> 
> ccache is installed in the Windows VM images now [1]. It can be used
> as 'set CC=ccache.exe cl.exe' in the Windows CI task.
> 
> [1] https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e

Thanks.  I think that works by using a "shim" created by choco in
C:\ProgramData\chocolatey\bin.

But going through that indirection seems to incur an extra 15sec of
compilation time; I think we'll want to do something to avoid that.

I'm not sure what the options are, like maybe installing ccache into a
fixed path like c:\ccache or installing a custom link, to a "pinned"
version of ccache.

-- 
Justin

Re: CI and test improvements

From

Andres Freund

Date:

14 June 2024, 15:34:37

Hi,

On June 14, 2024 8:22:01 AM PDT, Justin Pryzby <pryzby@telsasoft.com> wrote:
>On Fri, Jun 14, 2024 at 05:36:54PM +0300, Nazir Bilal Yavuz wrote:
>> Hi,
>>
>> On Thu, 13 Jun 2024 at 14:56, Justin Pryzby <pryzby@telsasoft.com> wrote:
>> >
>> > ccache should be installed in the image rather than re-installed on each
>> > invocation.
>>
>> ccache is installed in the Windows VM images now [1]. It can be used
>> as 'set CC=ccache.exe cl.exe' in the Windows CI task.
>>
>> [1] https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e
>
>Thanks.  I think that works by using a "shim" created by choco in
>C:\ProgramData\chocolatey\bin.
>
>But going through that indirection seems to incur an extra 15sec of
>compilation time; I think we'll want to do something to avoid that.
>
>I'm not sure what the options are, like maybe installing ccache into a
>fixed path like c:\ccache or installing a custom link, to a "pinned"
>version of ccache.


Hm. There actually already is the mingw ccache installed. The images for mingw and msvc used to be separate (for
startupperformance when using containers), but we just merged them.  So it might be easiest to just explicitly use the
ccachefrom there (also an explicit path might be faster). Could you check if that's fast? If not, we can just install
thebinaries distributed by the project, it's just more work to keep up2date that way.  

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: CI and test improvements

From

Justin Pryzby

Date:

18 June 2024, 13:36:57

On Fri, Jun 14, 2024 at 08:34:37AM -0700, Andres Freund wrote:
> Hm. There actually already is the mingw ccache installed. The images for mingw and msvc used to be separate (for
startupperformance when using containers), but we just merged them.  So it might be easiest to just explicitly use the
ccachefrom there
 

> (also an explicit path might be faster).

I don't think the path search is significant; it's fast so long as
there's no choco stub involved.

> Could you check if that's fast?

Yes, it is.

> If not, we can just install the binaries distributed by the project, it's just more work to keep up2date that way. 

I guess you mean a separate line to copy choco's ccache.exe somewhere.

-- 
Justin

Attachment

Re: CI and test improvements

From

Andres Freund

Date:

18 June 2024, 21:25:46

Hi,

On 2024-06-18 08:36:57 -0500, Justin Pryzby wrote:
> On Fri, Jun 14, 2024 at 08:34:37AM -0700, Andres Freund wrote:
> > Hm. There actually already is the mingw ccache installed. The images for mingw and msvc used to be separate (for
startupperformance when using containers), but we just merged them.  So it might be easiest to just explicitly use the
ccachefrom there
 
> 
> > (also an explicit path might be faster).
> 
> I don't think the path search is significant; it's fast so long as
> there's no choco stub involved.

Comparatively it's definitely small, but I've seen it make a difference on
windows.


> > Could you check if that's fast?
> 
> Yes, it is.

Cool.


> > If not, we can just install the binaries distributed by the project, it's just more work to keep up2date that way.

> 
> I guess you mean a separate line to copy choco's ccache.exe somewhere.

I was thinking of alternatively just using the binaries upstream
distributes. But the mingw way seems easier.

Perhaps we should add an environment variable pointing to ccache to the image,
or such?


>    build_script: |
>      vcvarsall x64
> -    ninja -C build
> +    ninja -C build |tee build.txt
> +    REM Since pipes lose the exit status of the preceding command, rerun the compilation
> +    REM without the pipe, exiting now if it fails, to avoid trying to run checks
> +    ninja -C build > nul

Perhaps we could do something like
        (ninja -C build && touch success) | tee build.txt
        cat success
?


> +    CCACHE_MAXSIZE: "500M"

Does it have to be this big to work?


>    configure_script: |
>      vcvarsall x64
> -    meson setup --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true
-Dinjection_points=true-Db_pch=true -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include
-DTAR=%TAR%-DPG_TEST_EXTRA="%PG_TEST_EXTRA%" build
 
> +    meson setup build --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true
-Dinjection_points=true-Db_pch=true -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include
-DTAR=%TAR%-DPG_TEST_EXTRA="%PG_TEST_EXTRA%" -Dc_args="/Z7"
 

A comment explaining why /Z7 is necessary would be good.



> From 3a399c6350ed2f341425431f184e382c3f46d981 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Sat, 26 Feb 2022 19:39:10 -0600
> Subject: [PATCH 4/7] WIP: cirrus: upload changed html docs as artifacts
> 
> This could be done on the client side (cfbot).  One advantage of doing
> it here is that fewer docs are uploaded - many patches won't upload docs
> at all.
> 
> https://www.postgresql.org/message-id/flat/20220409021853.GP24419@telsasoft.com
> https://www.postgresql.org/message-id/CAB8KJ=i4qmEuopQ+PCSMBzGd4O-Xv0FCnC+q1x7hN9hsdvkBug@mail.gmail.com
> 
> https://cirrus-ci.com/task/5396696388599808

I still think that this runs the risk of increasing space utilization and thus
increase frequency of caches/artifacts being purged.



> +  # The commit that this branch is rebased on.  There's no easy way to find this.

I don't think that's true, IIRC I've complained about it before. We can do
something like:

major_num=$(grep PG_MAJORVERSION_NUM build/src/include/pg_config.h|awk '{print $3}');
echo major: $major_num;
if git rev-parse --quiet --verify origin/REL_${major_num}_STABLE > /dev/null ; then
    basebranch=origin/REL_${major_num}_STABLE;
else
    basebranch=origin/master;
fi;
echo base branch: $basebranch
base_commit=$(git merge-base HEAD $basebranch)



Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

06 August 2024, 19:10:15

On Tue, Jun 18, 2024 at 02:25:46PM -0700, Andres Freund wrote:
> > > If not, we can just install the binaries distributed by the project, it's just more work to keep up2date that
way.

> > 
> > I guess you mean a separate line to copy choco's ccache.exe somewhere.
> 
> I was thinking of alternatively just using the binaries upstream
> distributes. But the mingw way seems easier.
> 
> Perhaps we should add an environment variable pointing to ccache to the image,
> or such?

I guess you mean changing the OS image so there's a $CCACHE.
That sounds fine...

> > +    CCACHE_MAXSIZE: "500M"
> 
> Does it have to be this big to work?

It's using 150MB for an initial compilation, and maxsize will need to be
20% larger for it to not evict its cache before it can be used.

The other ccaches (except for mingw) seem to be several times larger
than what's needed for a single compilation, which makes sense to cache
across multiple branches (or even commits in a single branch), and for
cfbot.

> A comment explaining why /Z7 is necessary would be good.

Sure

> >    build_script: |
> >      vcvarsall x64
> > -    ninja -C build
> > +    ninja -C build |tee build.txt
> > +    REM Since pipes lose the exit status of the preceding command, rerun the compilation
> > +    REM without the pipe, exiting now if it fails, to avoid trying to run checks
> > +    ninja -C build > nul
> 
> Perhaps we could do something like
>         (ninja -C build && touch success) | tee build.txt
>         cat success
> ?

I don't know -- a pipe alone seems more direct than a
subshell+conditional+pipe written in cmd.exe, whose syntax is not well
known here.

Maybe you're suggesting to write 

sh -c "(ninja -C build && touch success) | tee build.txt ; cat ./success"

But that's another layer of complexity .. for what ?

> > Subject: [PATCH 4/7] WIP: cirrus: upload changed html docs as artifacts
> 
> I still think that this runs the risk of increasing space utilization and thus
> increase frequency of caches/artifacts being purged.

Maybe it should run on the local macs where I think you can control
that.

> > +  # The commit that this branch is rebased on.  There's no easy way to find this.
> 
> I don't think that's true, IIRC I've complained about it before. We can do
> something like:

cfbot now exposes it, so it'd be trivial for that case (although there
was no response here to my inquiries about that).  I'll revisit this in
the future, once other patches have progressed.

> major_num=$(grep PG_MAJORVERSION_NUM build/src/include/pg_config.h|awk '{print $3}');
> echo major: $major_num;
> if git rev-parse --quiet --verify origin/REL_${major_num}_STABLE > /dev/null ; then
>     basebranch=origin/REL_${major_num}_STABLE;
> else
>     basebranch=origin/master;
> fi;
> echo base branch: $basebranch
> base_commit=$(git merge-base HEAD $basebranch)

Attachment

Re: CI and test improvements

From

Andres Freund

Date:

16 November 2024, 23:22:57

Hi,

On 2024-08-06 14:10:15 -0500, Justin Pryzby wrote:
> > > +  # The commit that this branch is rebased on.  There's no easy way to find this.
> > 
> > I don't think that's true, IIRC I've complained about it before. We can do
> > something like:
> 
> cfbot now exposes it, so it'd be trivial for that case (although there
> was no response here to my inquiries about that).  I'll revisit this in
> the future, once other patches have progressed.

What do you mean with "cfbot now exposes that"?

Greetings,

Andres Freund

Re: CI and test improvements

From

Justin Pryzby

Date:

16 November 2024, 23:44:04

On Sat, Nov 16, 2024 at 03:22:57PM -0500, Andres Freund wrote:
> Hi,
> 
> On 2024-08-06 14:10:15 -0500, Justin Pryzby wrote:
> > > > +  # The commit that this branch is rebased on.  There's no easy way to find this.
> > > 
> > > I don't think that's true, IIRC I've complained about it before. We can do
> > > something like:
> > 
> > cfbot now exposes it, so it'd be trivial for that case (although there
> > was no response here to my inquiries about that).  I'll revisit this in
> > the future, once other patches have progressed.
> 
> What do you mean with "cfbot now exposes that"?

cfbot now makes one single merge commit, indicating the patch on which
the branch is based.
https://github.com/macdice/cfbot/pull/18

Note that my own cfbot branch evidently never got looked at since 2.5
years ago, which has been a discouraging factor for me.
20220623192343.GA22452@telsasoft.com