Thread: Another modest proposal for reducing CLOBBER_CACHE_ALWAYS runtime

Another modest proposal for reducing CLOBBER_CACHE_ALWAYS runtime

From
Tom Lane
Date:
I noted that, while privileges.sql doesn't stand out in terms of
runtime normally (it's only the fourth slowest test in its
parallel group), it looks absolutely horrid in CLOBBER_CACHE_ALWAYS
testing.  On hyrax's latest run, it takes nearly 9000 seconds longer
than the next-slowest member of its group.  Remembering that the
core regression tests are run thrice in a minimal buildfarm cycle,
this test is single-handedly responsible for over seven hours of the
54 hour total build cycle.

I dug into it and found that the core issue is much like that in
opr_sanity.sql, namely that we're repeating this plpgsql function
$bignum times:

CREATE FUNCTION leak(integer,integer) RETURNS boolean
  AS $$begin return $1 < $2; end$$
  LANGUAGE plpgsql immutable;

(I wonder whether the planner needs to invoke this function
quite so many times during selectivity estimation.  But,
again, improving that seems like a task for some other day.)

Now, as far as I can see, this function definition isn't doing
anything we can't do with an alias for the underlying int4lt
function: the fact that the implementation is in plpgsql
shouldn't matter at all for the purposes of this test.
So I replaced it, as per the attached patch.

On my machine, the time to run privileges.sql under
debug_invalidate_system_caches_always = 1
drops from

real    293m31.054s
to
real    1m47.807s

Yes, really.

            regards, tom lane

diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index 1b4fc16644..83cff902f3 100644
--- a/src/test/regress/expected/privileges.out
+++ b/src/test/regress/expected/privileges.out
@@ -38,6 +38,11 @@ ALTER GROUP regress_priv_group2 ADD USER regress_priv_user2;    -- duplicate
 NOTICE:  role "regress_priv_user2" is already a member of role "regress_priv_group2"
 ALTER GROUP regress_priv_group2 DROP USER regress_priv_user2;
 GRANT regress_priv_group2 TO regress_priv_user4 WITH ADMIN OPTION;
+-- prepare non-leakproof function for later
+CREATE FUNCTION leak(integer,integer) RETURNS boolean
+  AS 'int4lt'
+  LANGUAGE internal IMMUTABLE STRICT;  -- but deliberately not LEAKPROOF
+ALTER FUNCTION leak(integer,integer) OWNER TO regress_priv_user1;
 -- test owner privileges
 SET SESSION AUTHORIZATION regress_priv_user1;
 SELECT session_user, current_user;
@@ -233,9 +238,6 @@ ALTER TABLE atest12 SET (autovacuum_enabled = off);
 SET default_statistics_target = 10000;
 VACUUM ANALYZE atest12;
 RESET default_statistics_target;
-CREATE FUNCTION leak(integer,integer) RETURNS boolean
-  AS $$begin return $1 < $2; end$$
-  LANGUAGE plpgsql immutable;
 CREATE OPERATOR <<< (procedure = leak, leftarg = integer, rightarg = integer,
                      restrict = scalarltsel);
 -- views with leaky operator
diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql
index 013bc95c74..3d1a1db987 100644
--- a/src/test/regress/sql/privileges.sql
+++ b/src/test/regress/sql/privileges.sql
@@ -45,6 +45,12 @@ ALTER GROUP regress_priv_group2 ADD USER regress_priv_user2;    -- duplicate
 ALTER GROUP regress_priv_group2 DROP USER regress_priv_user2;
 GRANT regress_priv_group2 TO regress_priv_user4 WITH ADMIN OPTION;

+-- prepare non-leakproof function for later
+CREATE FUNCTION leak(integer,integer) RETURNS boolean
+  AS 'int4lt'
+  LANGUAGE internal IMMUTABLE STRICT;  -- but deliberately not LEAKPROOF
+ALTER FUNCTION leak(integer,integer) OWNER TO regress_priv_user1;
+
 -- test owner privileges

 SET SESSION AUTHORIZATION regress_priv_user1;
@@ -166,9 +172,6 @@ SET default_statistics_target = 10000;
 VACUUM ANALYZE atest12;
 RESET default_statistics_target;

-CREATE FUNCTION leak(integer,integer) RETURNS boolean
-  AS $$begin return $1 < $2; end$$
-  LANGUAGE plpgsql immutable;
 CREATE OPERATOR <<< (procedure = leak, leftarg = integer, rightarg = integer,
                      restrict = scalarltsel);


Re: Another modest proposal for reducing CLOBBER_CACHE_ALWAYS runtime

From
David Rowley
Date:
On Mon, 10 May 2021 at 18:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I dug into it and found that the core issue is much like that in
> opr_sanity.sql, namely that we're repeating this plpgsql function
> $bignum times:
>
> CREATE FUNCTION leak(integer,integer) RETURNS boolean
>   AS $$begin return $1 < $2; end$$
>   LANGUAGE plpgsql immutable;

> real    293m31.054s
> to
> real    1m47.807s
>
> Yes, really.

That's quite impressive.

I've very much in favour of this change. Making it more realistic to
run the regression tests on a CLOBBER_CACHE_ALWAYS builds before a
commit is a very worthy goal and this is a big step towards that.
Nice.

David



David Rowley <dgrowleyml@gmail.com> writes:
> On Mon, 10 May 2021 at 18:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> real    293m31.054s
>> to
>> real    1m47.807s
>> Yes, really.

> That's quite impressive.

> I've very much in favour of this change. Making it more realistic to
> run the regression tests on a CLOBBER_CACHE_ALWAYS builds before a
> commit is a very worthy goal and this is a big step towards that.
> Nice.

It occurred to me to check hyrax's results on the older branches
(it also tests v12 and v13), and the slope of the curve is bad:

Branch        Latest "check" phase runtime

HEAD        13:56:11
v13        11:00:33
v12        6:05:30

Seems like we'd better do something about that.

About 2.5 hours worth of the jump from 12 to 13 can be blamed on
the privileges test, looks like.  The slowdown in that evidently
can be blamed on 0c882e52a86, which added this:

+-- results below depend on having quite accurate stats for atest12
+SET default_statistics_target = 10000;
 VACUUM ANALYZE atest12;
+RESET default_statistics_target;

The slow queries in that test all cause the planner to apply the
"leak()" function to every histogram entry for atest12, so this
one change caused a 100X increase in the amount of work there.
I find it a bit remarkable that we barely noticed that in normal
operation.  In CCA mode, though, each leak() call takes circa 100ms
(at least on my workstation), so kaboom.

Anyway, I'm now feeling that what I should do with this patch
is wait for the release cycle to finish and then apply it to
v13 as well as HEAD.  The other patch I proposed, to cut
opr_sanity's runtime, may be too invasive for back-patch.

More generally, there is an upward creep in the test runtimes
that doesn't seem to be entirely accounted for by our constantly
adding more tests.  I am suspicious that plpgsql may be largely
to blame for this.  The smoking gun I found for that is the
runtimes for the plpgsql_control test, which hasn't changed
*at all* since it was added in v11; but hyrax shows these
runtimes:

HEAD:
test plpgsql_control              ... ok        56105 ms
v13:
test plpgsql_control              ... ok        46879 ms
v12:
test plpgsql_control              ... ok        30809 ms

In normal builds that test's time has held pretty steady.
So I'm not sure what's underneath this rock, but I plan
to try to find out.

            regards, tom lane



Re: Another modest proposal for reducing CLOBBER_CACHE_ALWAYS runtime

From
Andrew Dunstan
Date:
On 5/10/21 2:03 AM, Tom Lane wrote:
> I noted that, while privileges.sql doesn't stand out in terms of
> runtime normally (it's only the fourth slowest test in its
> parallel group), it looks absolutely horrid in CLOBBER_CACHE_ALWAYS
> testing.  On hyrax's latest run, it takes nearly 9000 seconds longer
> than the next-slowest member of its group.  Remembering that the
> core regression tests are run thrice in a minimal buildfarm cycle,
> this test is single-handedly responsible for over seven hours of the
> 54 hour total build cycle.


FWIW, I would normally suggest a CCA enabled animal disable some of the
tests so that it doesn't run the core tests thrice. The easiest way to
do that is: --skip-steps='check pg_upgrade-check'


Even so your point is well taken.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com