Some other CLOBBER_CACHE_ALWAYS culprits - Mailing list pgsql-hackers

From Tom Lane
Subject Some other CLOBBER_CACHE_ALWAYS culprits
Date
Msg-id 852314.1620749013@sss.pgh.pa.us
Whole thread Raw
Responses Re: Some other CLOBBER_CACHE_ALWAYS culprits
List pgsql-hackers
In some recent threads I complained about how CLOBBER_CACHE_ALWAYS
test runs have gotten markedly slower over the past couple of release
cycles [1][2][3].  It'd be impossibly time-consuming to investigate the
causes by repeating the whole test corpus, but I've had some success in
bisecting while measuring the runtime of just a single test script.
In this report I'm looking at src/pl/plpgsql/src/sql/plpgsql_control.sql,
which is a useful candidate because it hasn't changed at all since v11.
Despite that, hyrax's latest runs show these runtimes:

HEAD:
test plpgsql_control              ... ok        56105 ms
REL_13_STABLE:
test plpgsql_control              ... ok        46879 ms
REL_12_STABLE:
test plpgsql_control              ... ok        30809 ms

so we have clearly made CCA runs a whole lot worse since v12.
(Non-CCA buildfarm members show runtimes that are about the same
across all three branches.)

I've reproduced (some of) these results on my shiny new M1 mini,
which is a tad faster than hyrax's host: it can do the test on HEAD
(049e1e2ed) in 15.413s.  (Note: this, and the numbers following, are
median-of-3-runs; the run variance is enough that I wouldn't trust
them to less than a tenth of a second.)  The run time at 615cebc94
(v12 branchoff point) is 11.861s.  Bisecting, I found that there were
three commits that accounted for almost all of the slowdown since v12:

0d861bbb7  Add deduplication to nbtree
11.836s -> 12.339s
(that's runtime on the preceding commit -> runtime on this commit)

8f59f6b9c(+fbc7a7160)  Improve performance of "simple expressions" in PL/pgSQL
12.334s -> 14.158s

911e70207  Implement operator class parameters
14.263s -> 15.415s

One thing that confuses me, though, is that all of these are v13-era
commits (they all went into the tree during March 2020).  I don't
see any real difference in the runtime from the v13 branchoff point
to now, which doesn't square with hyrax's results.  Could there be
that much inter-platform variation in the overhead of CCA?  It might
be useful for somebody with patience and a fast Intel machine to try
to replicate these results.

Anyway, it seems like these three features deserve some study as to
why they caused so much slowdown under CCA.  It's not so surprising
that 8f59f6b9c would have an effect on a test of plpgsql control
logic, but I find it surprising and rather disturbing that either
of the others would.

BTW, I was also tracking initdb runtime under CCA while I did this,
and while I didn't formally bisect on that basis, I did notice that
911e70207 had quite a negative impact on that too: 180s -> 195s.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/242172.1620494497%40sss.pgh.pa.us#eab25bb83bdcdd0f58b2d712b4971fcd
[2] https://www.postgresql.org/message-id/flat/292305.1620503097%40sss.pgh.pa.us
[3] https://www.postgresql.org/message-id/flat/575884.1620626638%40sss.pgh.pa.us



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Performance degradation of REFRESH MATERIALIZED VIEW
Next
From: Bharath Rupireddy
Date:
Subject: Re: Enhanced error message to include hint messages for redundant options error