Thread: Another modest proposal for reducing CLOBBER_CACHE_ALWAYS runtime
I noted that, while privileges.sql doesn't stand out in terms of runtime normally (it's only the fourth slowest test in its parallel group), it looks absolutely horrid in CLOBBER_CACHE_ALWAYS testing. On hyrax's latest run, it takes nearly 9000 seconds longer than the next-slowest member of its group. Remembering that the core regression tests are run thrice in a minimal buildfarm cycle, this test is single-handedly responsible for over seven hours of the 54 hour total build cycle. I dug into it and found that the core issue is much like that in opr_sanity.sql, namely that we're repeating this plpgsql function $bignum times: CREATE FUNCTION leak(integer,integer) RETURNS boolean AS $$begin return $1 < $2; end$$ LANGUAGE plpgsql immutable; (I wonder whether the planner needs to invoke this function quite so many times during selectivity estimation. But, again, improving that seems like a task for some other day.) Now, as far as I can see, this function definition isn't doing anything we can't do with an alias for the underlying int4lt function: the fact that the implementation is in plpgsql shouldn't matter at all for the purposes of this test. So I replaced it, as per the attached patch. On my machine, the time to run privileges.sql under debug_invalidate_system_caches_always = 1 drops from real 293m31.054s to real 1m47.807s Yes, really. regards, tom lane diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out index 1b4fc16644..83cff902f3 100644 --- a/src/test/regress/expected/privileges.out +++ b/src/test/regress/expected/privileges.out @@ -38,6 +38,11 @@ ALTER GROUP regress_priv_group2 ADD USER regress_priv_user2; -- duplicate NOTICE: role "regress_priv_user2" is already a member of role "regress_priv_group2" ALTER GROUP regress_priv_group2 DROP USER regress_priv_user2; GRANT regress_priv_group2 TO regress_priv_user4 WITH ADMIN OPTION; +-- prepare non-leakproof function for later +CREATE FUNCTION leak(integer,integer) RETURNS boolean + AS 'int4lt' + LANGUAGE internal IMMUTABLE STRICT; -- but deliberately not LEAKPROOF +ALTER FUNCTION leak(integer,integer) OWNER TO regress_priv_user1; -- test owner privileges SET SESSION AUTHORIZATION regress_priv_user1; SELECT session_user, current_user; @@ -233,9 +238,6 @@ ALTER TABLE atest12 SET (autovacuum_enabled = off); SET default_statistics_target = 10000; VACUUM ANALYZE atest12; RESET default_statistics_target; -CREATE FUNCTION leak(integer,integer) RETURNS boolean - AS $$begin return $1 < $2; end$$ - LANGUAGE plpgsql immutable; CREATE OPERATOR <<< (procedure = leak, leftarg = integer, rightarg = integer, restrict = scalarltsel); -- views with leaky operator diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql index 013bc95c74..3d1a1db987 100644 --- a/src/test/regress/sql/privileges.sql +++ b/src/test/regress/sql/privileges.sql @@ -45,6 +45,12 @@ ALTER GROUP regress_priv_group2 ADD USER regress_priv_user2; -- duplicate ALTER GROUP regress_priv_group2 DROP USER regress_priv_user2; GRANT regress_priv_group2 TO regress_priv_user4 WITH ADMIN OPTION; +-- prepare non-leakproof function for later +CREATE FUNCTION leak(integer,integer) RETURNS boolean + AS 'int4lt' + LANGUAGE internal IMMUTABLE STRICT; -- but deliberately not LEAKPROOF +ALTER FUNCTION leak(integer,integer) OWNER TO regress_priv_user1; + -- test owner privileges SET SESSION AUTHORIZATION regress_priv_user1; @@ -166,9 +172,6 @@ SET default_statistics_target = 10000; VACUUM ANALYZE atest12; RESET default_statistics_target; -CREATE FUNCTION leak(integer,integer) RETURNS boolean - AS $$begin return $1 < $2; end$$ - LANGUAGE plpgsql immutable; CREATE OPERATOR <<< (procedure = leak, leftarg = integer, rightarg = integer, restrict = scalarltsel);
On Mon, 10 May 2021 at 18:04, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I dug into it and found that the core issue is much like that in > opr_sanity.sql, namely that we're repeating this plpgsql function > $bignum times: > > CREATE FUNCTION leak(integer,integer) RETURNS boolean > AS $$begin return $1 < $2; end$$ > LANGUAGE plpgsql immutable; > real 293m31.054s > to > real 1m47.807s > > Yes, really. That's quite impressive. I've very much in favour of this change. Making it more realistic to run the regression tests on a CLOBBER_CACHE_ALWAYS builds before a commit is a very worthy goal and this is a big step towards that. Nice. David
David Rowley <dgrowleyml@gmail.com> writes: > On Mon, 10 May 2021 at 18:04, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> real 293m31.054s >> to >> real 1m47.807s >> Yes, really. > That's quite impressive. > I've very much in favour of this change. Making it more realistic to > run the regression tests on a CLOBBER_CACHE_ALWAYS builds before a > commit is a very worthy goal and this is a big step towards that. > Nice. It occurred to me to check hyrax's results on the older branches (it also tests v12 and v13), and the slope of the curve is bad: Branch Latest "check" phase runtime HEAD 13:56:11 v13 11:00:33 v12 6:05:30 Seems like we'd better do something about that. About 2.5 hours worth of the jump from 12 to 13 can be blamed on the privileges test, looks like. The slowdown in that evidently can be blamed on 0c882e52a86, which added this: +-- results below depend on having quite accurate stats for atest12 +SET default_statistics_target = 10000; VACUUM ANALYZE atest12; +RESET default_statistics_target; The slow queries in that test all cause the planner to apply the "leak()" function to every histogram entry for atest12, so this one change caused a 100X increase in the amount of work there. I find it a bit remarkable that we barely noticed that in normal operation. In CCA mode, though, each leak() call takes circa 100ms (at least on my workstation), so kaboom. Anyway, I'm now feeling that what I should do with this patch is wait for the release cycle to finish and then apply it to v13 as well as HEAD. The other patch I proposed, to cut opr_sanity's runtime, may be too invasive for back-patch. More generally, there is an upward creep in the test runtimes that doesn't seem to be entirely accounted for by our constantly adding more tests. I am suspicious that plpgsql may be largely to blame for this. The smoking gun I found for that is the runtimes for the plpgsql_control test, which hasn't changed *at all* since it was added in v11; but hyrax shows these runtimes: HEAD: test plpgsql_control ... ok 56105 ms v13: test plpgsql_control ... ok 46879 ms v12: test plpgsql_control ... ok 30809 ms In normal builds that test's time has held pretty steady. So I'm not sure what's underneath this rock, but I plan to try to find out. regards, tom lane
On 5/10/21 2:03 AM, Tom Lane wrote: > I noted that, while privileges.sql doesn't stand out in terms of > runtime normally (it's only the fourth slowest test in its > parallel group), it looks absolutely horrid in CLOBBER_CACHE_ALWAYS > testing. On hyrax's latest run, it takes nearly 9000 seconds longer > than the next-slowest member of its group. Remembering that the > core regression tests are run thrice in a minimal buildfarm cycle, > this test is single-handedly responsible for over seven hours of the > 54 hour total build cycle. FWIW, I would normally suggest a CCA enabled animal disable some of the tests so that it doesn't run the core tests thrice. The easiest way to do that is: --skip-steps='check pg_upgrade-check' Even so your point is well taken. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com