Re: Test to dump and restore objects left behind by regression - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Test to dump and restore objects left behind by regression |
Date | |
Msg-id | 4nuo6wzmidxv7q5qayywazp6iqk7b66psyznjvij3s3jgqoq5p@kxk76xvgwsna Whole thread Raw |
In response to | Re: Test to dump and restore objects left behind by regression (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Responses |
Re: Test to dump and restore objects left behind by regression
|
List | pgsql-hackers |
Hi, On 2025-04-03 19:14:02 +0200, Alvaro Herrera wrote: > On 2025-Apr-03, Andres Freund wrote: > > > I've increased the timeout even further, but I can't say that I am happy about > > the slowest test getting even slower. Adding test time in the serially slowest > > test is way worse than adding the same time in a concurrent test. > > Yeah. We discussed strategies to shorten the runtime, but the agreement > upthread was that we'd look for more elaborate ways to do that > afterwards. I think it's not good to just say "we'll maybe somehow fix it in the future", particularly if the solution is by no means agreed to. If this were a test that wasn't already the bottleneck for test cycles, it'd would be a different story, but it is. I'm already unhappy that 002_pg_upgrade got noticeably slower due to the stats dump changes. This made it even worse. > As I mentioned, I can see adding something like PG_TEST_EXCLUDE that we > could use to suppress this test on slow hosts. Would that work for you? Not particularly well. For one I don't actually think it's good to exclude it from something like valgrind testing. But it's also something that wouldn't really be usable locally, I think, given that we'd be expected to keep the test working. As outlined below, it really affects the test-hack-test loop times. > (We also discussed the fact that this was part of 002_pg_upgrade.pl > instead of being elsewhere. The reason is that this depends on the > regression tests having run, and this is the only TAP test that does > that. Well, this one and 027_stream_regress.pl which is even slower.) FWIW, for me 027 is actually considerably faster. In an cassert -O0 build (my normal development env, I find even -Og too problematic for debugging): pg_upgrade/002_pg_upgrade 96.61s recovery/027_stream_regress 66.04s After git revert 8806e4e8deb1e21715e031e17181d904825a410e abe56227b2e213755dd3e194c530f5f06467bd7c 172259afb563d35001410dc6daad78b250924038 pg_upgrade/002_pg_upgrade 75.09s Slower by 29%, far from the 3s increased time I saw mentioned somewhere. And this really affects the overall test time: All tests before: real 1m38.173s user 5m52.500s sys 4m23.574s All tests after: real 2m0.397s user 5m53.625s sys 4m30.518s The CPU time increase is rather minimal, but the wall clock time increase is 22%. 17: real 1m14.822s user 4m2.630s sys 3m22.384s We regressed wall clock time *60%* from 17->18. Some test cycle increase is reasonable and can largely be compensated with hardware, but this cycle we're growing way faster than hardware gets faster. I don't think that's sustainable. On valgrind it's also is very much true that 002_pg_upgrade is slower: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2025-04-04%2006%3A53%3A47&stg=recovery-check 276/333 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 10337.79s 22 subtestspassed 233/333 postgresql:recovery / recovery/027_stream_regress OK 7642.90s 9 subtestspassed To look at the times in a bit more detail: [09:19:31.371](6146.167s) ok 5 - regression tests pass ... # Running: pg_dump -Fd -j2 --no-sync -d port=31947 host=/tmp/HlLiWBwv6T dbname='regression' --create -f /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/regression.dump [09:26:35.178](201.199s) ok 6 - pg_dump on source instance # Running: pg_restore --create -j2 -d postgres /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/regression.dump [09:35:53.079](557.901s) ok 7 - pg_restore to destination instance # Running: pg_dump --no-sync -d port=31947 host=/tmp/HlLiWBwv6T dbname='regression' -f /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/src_dump.sql # Running: pg_dump --no-sync -d port=31949 host=/tmp/HlLiWBwv6T dbname='regression' -f /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/dest_dump.sql [09:43:18.485](445.407s) ok 8 - dump outputs from original and restored regression databases match # Running: pg_dumpall --no-sync --dbname port=31947 host=/tmp/HlLiWBwv6T dbname='postgres' --file /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/dump1.sql [09:43:45.895](27.410s) ok 9 - dump before running pg_upgrade I think the time for "pg_dump on source instance" is somewhat misleading, it includes initdb and starting the server. But it's pretty obvious that the newly added steps cost quite a bit of time. > > I suspect that the test will go a bit faster if log_statement weren't forced > > on, printing that many log lines, with context, does make valgrind slower, > > IME. But Cluster.pm forces it to on, and I suspect that putting a global > > log_statement=false into TEMP_CONFIG would have it's own disadvantages. > > I'm sure we can make this change as well somehow, overridding the > setting just 002_pg_upgrade.pl, as attached. I don't think it's > relevant for this particular test. The log files go from 21 MB to > 2.4 MB. It's not nothing ... That is a nice improvement. I have to run a few errands, can check how that affects valgrind times of a dump and restore of the regression db. Greetings, Andres Freund
pgsql-hackers by date: