On Sat, Feb 12, 2022 at 02:26:25PM -0800, Andres Freund wrote:
> On 2022-02-12 16:06:40 -0600, Justin Pryzby wrote:
> > I had some success with that, but it doesn't seem to be significantly faster -
> > it looks a lot like the tests are not actually running in parallel.
Note that the total test time is close to the sum of the individual test times.
But I think that may be an artifact of how prove is showing/attributing times
to each test (which, if so, is misleading).
> Note that prove unfortunately serializes the test output to be in the order it
> started them, even when tests run concurrently. Extremely unhelpful, but ...
Are you sure ? When I run it locally, I see:
rm -fr src/test/recovery/tmp_check ; time PERL5LIB=`pwd`/src/test/perl TESTDIR=`pwd`/src/test/recovery
PATH=`pwd`/tmp_install/usr/local/pgsql/bin:$PATHPG_REGRESS=`pwd`/src/test/regress/pg_regress
REGRESS_SHLIB=`pwd`/src/test/regress/regress.soprove --time -j4 --ext '*.pl' `find src -name t`
...
[15:34:48] src/bin/scripts/t/101_vacuumdb_all.pl ....................... ok 104 ms ( 0.00 usr 0.00 sys + 2.35
cusr 0.47 csys = 2.82 CPU)
[15:34:49] src/bin/scripts/t/090_reindexdb.pl .......................... ok 8894 ms ( 0.06 usr 0.01 sys + 14.45
cusr 3.38 csys = 17.90 CPU)
[15:34:50] src/bin/pg_config/t/001_pg_config.pl ........................ ok 79 ms ( 0.00 usr 0.01 sys + 0.23
cusr 0.04 csys = 0.28 CPU)
[15:34:50] src/bin/pg_waldump/t/001_basic.pl ........................... ok 35 ms ( 0.00 usr 0.00 sys + 0.26
cusr 0.02 csys = 0.28 CPU)
[15:34:51] src/bin/pg_test_fsync/t/001_basic.pl ........................ ok 100 ms ( 0.01 usr 0.00 sys + 0.24
cusr 0.04 csys = 0.29 CPU)
[15:34:51] src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl ........ ok 177 ms ( 0.02 usr 0.00 sys + 0.26
cusr 0.03 csys = 0.31 CPU)
[15:34:55] src/bin/scripts/t/100_vacuumdb.pl ........................... ok 11267 ms ( 0.12 usr 0.04 sys + 13.47
cusr 3.20 csys = 16.83 CPU)
[15:34:57] src/bin/scripts/t/102_vacuumdb_stages.pl .................... ok 5802 ms ( 0.06 usr 0.01 sys + 7.70
cusr 1.37 csys = 9.14 CPU)
...
=> scripts/ stuff, followed by other stuff, followed by more, slower, scripts/ stuff.
But I never saw that in cirrus.
> Isn't this kind of a good test time? I thought earlier your alltaptests target
> took a good bit longer?
The original alltaptests runs in 16m 21s.
https://cirrus-ci.com/task/6679061752709120
2 weeks ago, it was ~14min with your patch to cache initdb.
https://cirrus-ci.com/task/5439320633901056
As I recall, that didn't seem to improve runtime when combined with my parallel
patch.
> One nice bit is that the output is a *lot* easier to read.
>
> You could try improving the total time by having prove remember slow tests and
> use that file to run the slowest tests first next time. --state slow,save or
> such I believe. Of course we'd have to save that state file...
In a test, this hurt rather than helped (13m 42s).
https://cirrus-ci.com/task/6359167186239488
I'm not surprised - it makes sense to run 10 fast tests at once, but usually
doesn't make sense to run 10 slow tests tests at once (at least a couple of
which are doing something intensive). It was faster (12m16s) to do it
backwards (fastest tests first).
https://cirrus-ci.com/task/5745115443494912
BTW, does it make sense to remove test_regress_parallel_script ? The
pg_upgrade run would do the same things, no ? If so, it might make sense to
run that first. OTOH, you suggested to run the upgrade tests with checksums
enabled, which seems like a good idea.
Note that in the attached patches, I changed the msvc "warnings" to use "tee".
I don't know how to fix the pipeline test in a less hacky way...
You said the docs build should be a separate task, but then said that it'd be
okay to remove the dependency. So I did it both ways. There's currently some
duplication between the docs patch and code coverage patch.
--
Justin