Thread: Some other CLOBBER_CACHE_ALWAYS culprits
In some recent threads I complained about how CLOBBER_CACHE_ALWAYS test runs have gotten markedly slower over the past couple of release cycles [1][2][3]. It'd be impossibly time-consuming to investigate the causes by repeating the whole test corpus, but I've had some success in bisecting while measuring the runtime of just a single test script. In this report I'm looking at src/pl/plpgsql/src/sql/plpgsql_control.sql, which is a useful candidate because it hasn't changed at all since v11. Despite that, hyrax's latest runs show these runtimes: HEAD: test plpgsql_control ... ok 56105 ms REL_13_STABLE: test plpgsql_control ... ok 46879 ms REL_12_STABLE: test plpgsql_control ... ok 30809 ms so we have clearly made CCA runs a whole lot worse since v12. (Non-CCA buildfarm members show runtimes that are about the same across all three branches.) I've reproduced (some of) these results on my shiny new M1 mini, which is a tad faster than hyrax's host: it can do the test on HEAD (049e1e2ed) in 15.413s. (Note: this, and the numbers following, are median-of-3-runs; the run variance is enough that I wouldn't trust them to less than a tenth of a second.) The run time at 615cebc94 (v12 branchoff point) is 11.861s. Bisecting, I found that there were three commits that accounted for almost all of the slowdown since v12: 0d861bbb7 Add deduplication to nbtree 11.836s -> 12.339s (that's runtime on the preceding commit -> runtime on this commit) 8f59f6b9c(+fbc7a7160) Improve performance of "simple expressions" in PL/pgSQL 12.334s -> 14.158s 911e70207 Implement operator class parameters 14.263s -> 15.415s One thing that confuses me, though, is that all of these are v13-era commits (they all went into the tree during March 2020). I don't see any real difference in the runtime from the v13 branchoff point to now, which doesn't square with hyrax's results. Could there be that much inter-platform variation in the overhead of CCA? It might be useful for somebody with patience and a fast Intel machine to try to replicate these results. Anyway, it seems like these three features deserve some study as to why they caused so much slowdown under CCA. It's not so surprising that 8f59f6b9c would have an effect on a test of plpgsql control logic, but I find it surprising and rather disturbing that either of the others would. BTW, I was also tracking initdb runtime under CCA while I did this, and while I didn't formally bisect on that basis, I did notice that 911e70207 had quite a negative impact on that too: 180s -> 195s. regards, tom lane [1] https://www.postgresql.org/message-id/flat/242172.1620494497%40sss.pgh.pa.us#eab25bb83bdcdd0f58b2d712b4971fcd [2] https://www.postgresql.org/message-id/flat/292305.1620503097%40sss.pgh.pa.us [3] https://www.postgresql.org/message-id/flat/575884.1620626638%40sss.pgh.pa.us
Hi, On 2021-05-11 12:03:33 -0400, Tom Lane wrote: > In some recent threads I complained about how CLOBBER_CACHE_ALWAYS > test runs have gotten markedly slower over the past couple of release > cycles [1][2][3]. I wonder if the best way to attack this in a more fundamental manner would be to handle nested invalidations different than we do today. Not just for CCA/CCR performance, but also to make invalidations easier to understand in general. Right now, for CCA, we'll often invalidate all the caches dozens of times for a single syscache lookup. Often rebuilding a lot of the entries over and over again, even when they are not accessed during the lookup (because relcache eagerly rebuilds cache entries). Of course that's terribly expensive. It's something like O(lookups * cache accesses during lookup * total cache entries) I think? IMO the problem largely stems from eagerly rebuilding *all* relcache entries during invalidation processing. Something triggers InvalidateSystemCaches(). That in turn triggers RelationBuildDesc() for all relations, which triggers a lot of syscache lookups, which trigger a lot of relcache lookups, ... And that's just during the InvalidateSystemCaches(). Most subsequent syscache lookup will be a cache miss too (unless accessed during the relcache rebuilds) - each syscache miss will trigger a few system relations to be locked, triggering separate InvalidateSystemCaches(). If we split cache invalidation into separate invalidation and cache-rebuild phases, we'd likely be a lot better off, I think, by being able to avoid the repeated rebuilds of cache entries that are never accessed during invalidation. I'd prototyped a relcache version of this in https://postgr.es/m/20180829083730.n645apqhb2gyih3g%40alap3.anarazel.de but it seems like it might be possible to generalize? > so we have clearly made CCA runs a whole lot worse since v12. > (Non-CCA buildfarm members show runtimes that are about the same > across all three branches.) > > I've reproduced (some of) these results on my shiny new M1 mini, > which is a tad faster than hyrax's host: it can do the test on HEAD > (049e1e2ed) in 15.413s. (Note: this, and the numbers following, are > median-of-3-runs; the run variance is enough that I wouldn't trust > them to less than a tenth of a second.) The run time at 615cebc94 > (v12 branchoff point) is 11.861s. Bisecting, I found that there were > three commits that accounted for almost all of the slowdown since v12: > > 0d861bbb7 Add deduplication to nbtree > 11.836s -> 12.339s > (that's runtime on the preceding commit -> runtime on this commit) Hm. The most likely explanation seems to be that this shows that index accesses without using deduplication are slightly more expensive due to the change? System tables never use the duplication stuff (cf _bt_allequalimage())... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2021-05-11 12:03:33 -0400, Tom Lane wrote: >> In some recent threads I complained about how CLOBBER_CACHE_ALWAYS >> test runs have gotten markedly slower over the past couple of release >> cycles [1][2][3]. > I wonder if the best way to attack this in a more fundamental manner would be > to handle nested invalidations different than we do today. Not just for > CCA/CCR performance, but also to make invalidations easier to understand in > general. I spent some time thinking along those lines too, but desisted after concluding that that would fundamentally break the point of CCA testing, namely to be sure we survive when a cache flush occurs at $any-random-point. Sure, in practice it will not be the case that a flush occurs at EVERY random point. But I think if you try to optimize away a rebuild at point B on the grounds that you just did one at point A, you will fail to cover the scenario where flush requests arrive at exactly points A and B. > IMO the problem largely stems from eagerly rebuilding *all* relcache entries > during invalidation processing. Uh, we don't do that; only for relations that are pinned, which we know are being used. What it looked like to me, in an admittedly cursory bit of perf testing, was that most of the cycles were going into fetching cache entries from catalogs over and over. But it's hard to avoid that. I did wonder for a bit about doing something like moving cache entries to another physical place rather than dropping them. I don't really like that either though, because then the behavior that CCA is testing really has not got that much at all to do with real system behavior. regards, tom lane
Hi, On 2021-05-11 19:30:48 -0400, Tom Lane wrote: > > IMO the problem largely stems from eagerly rebuilding *all* relcache entries > > during invalidation processing. > > Uh, we don't do that; only for relations that are pinned, which we > know are being used. Sorry, all surviving relcache entries - but that's typically quite a few. > I spent some time thinking along those lines too, but desisted after > concluding that that would fundamentally break the point of CCA > testing, namely to be sure we survive when a cache flush occurs at > $any-random-point. Why would rebuilding non-accessed relcache entries over and over help with that? I am not proposing that we do not mark all cache entries are invalid, or that we do not rebuild tables that aren't accessed. During an extremely trivial query from a user defined table ('blarg'), here's top 10 RelationBuildDesc() calls: 344 rebuild pg_attrdef 274 rebuild pg_opclass 274 rebuild pg_amproc 260 rebuild pg_index 243 rebuild pg_am 236 rebuild pg_attrdef_adrelid_adnum_index 236 rebuild blarg 74 rebuild pg_namespace 52 rebuild pg_statistic 37 rebuild pg_tablespace 134.420 ms Here's the same when joining two tables: 5828 rebuild pg_opclass 2897 rebuild pg_amop 2250 rebuild pg_cast 2086 rebuild pg_amproc 1465 rebuild pg_statistic 1274 rebuild pg_index 936 rebuild pg_attrdef 646 rebuild pg_operator 619 rebuild pg_am 518 rebuild pg_tablespace 1414.886 ms three tables: 16614 rebuild pg_opclass 7787 rebuild pg_amop 6750 rebuild pg_cast 5388 rebuild pg_amproc 5141 rebuild pg_statistic 3058 rebuild pg_index 1824 rebuild pg_operator 1374 rebuild pg_attrdef 1233 rebuild pg_am 1110 rebuild pg_tablespace 3971.506 ms four: 33328 rebuild pg_opclass 16020 rebuild pg_amop 14000 rebuild pg_statistic 13500 rebuild pg_cast 10876 rebuild pg_amproc 5792 rebuild pg_index 3950 rebuild pg_operator 2035 rebuild pg_am 1924 rebuild pg_tablespace 1746 rebuild pg_attrdef 7927.172 ms This omits all the work done as part of RelationReloadNailed(), but shows the problem quite clearly, I think? Basically, every additional accessed table in a transaction makes things drastically slower. In the four join case my four user defined tables were rebuilt a lot of times: 463 rebuild blarg 440 rebuild blarg2 293 rebuild blarg3 233 rebuild blarg4 despite obviously not being relevant for the cache invalidation processing itself. The list of sytable scans in the four table case: 380278 systable_beginscan: pg_class, using index: 1 111539 systable_beginscan: pg_attribute, using index: 1 73544 systable_beginscan: pg_class, using index: 0 4134 systable_beginscan: pg_opclass, using index: 1 4099 systable_beginscan: pg_amproc, using index: 1 2791 systable_beginscan: pg_am, using index: 0 2061 systable_beginscan: pg_index, using index: 1 1429 systable_beginscan: pg_attrdef, using index: 1 345 systable_beginscan: pg_type, using index: 1 300 systable_beginscan: pg_cast, using index: 1 195 systable_beginscan: pg_statistic, using index: 1 191 systable_beginscan: pg_amop, using index: 1 103 systable_beginscan: pg_operator, using index: 1 52 systable_beginscan: pg_tablespace, using index: 1 33 systable_beginscan: pg_proc, using index: 1 27 systable_beginscan: pg_authid, using index: 1 20 systable_beginscan: pg_namespace, using index: 1 4 systable_beginscan: pg_statistic_ext, using index: 1 581145 in total. > Sure, in practice it will not be the case that a flush occurs at EVERY > random point. But I think if you try to optimize away a rebuild at > point B on the grounds that you just did one at point A, you will fail > to cover the scenario where flush requests arrive at exactly points A > and B. I don't think we'd loose a lot of practical coverage if we avoided rebuilding non-accessed relcache entries eagerly during cache lookups. What coverage do we e.g. gain by having a single SearchCatCacheMiss() triggering rebuilding the relcache of a user defined table several times? The InvalidateSystemCaches() marks all catcache entries as invalid. The next catcache lookup will thus trigger a cache miss. That cache miss will typically at least open the previously not locked relation + index the cache is over. Each of those relation opens will fire off another InvalidateSystemCaches(). Which will rebuild all the surviving relcache entries at least twice - despite never being accessed in that path. > What it looked like to me, in an admittedly cursory bit of perf > testing, was that most of the cycles were going into fetching > cache entries from catalogs over and over. But it's hard to avoid > that. Sure - but that's only because we rebuild stuff over and over despite not being accessed... Greetings, Andres Freund
Hi, On 2021-05-11 19:02:00 -0700, Andres Freund wrote: > Why would rebuilding non-accessed relcache entries over and over help > with that? I am not proposing that we do not mark all cache entries are > invalid, or that we do not rebuild tables that aren't accessed. A slightly more concrete proposal: We introduce a new list of pending relcache invalidations. When RelationCacheInvalidateEntry() or RelationCacheInvalidateEntry() invalidate an entry, it gets put on that list (pretty much like the existing rebuildList in RelationCacheInvalidate(), except longer lived). When an invalid relcache entry is accessed, it is obviously immediately rebuilt. Normally RelationCacheInvalidate() eagerly processes that list, as well as in ProcessInvalidationMessages(), ReceiveSharedInvalidMessages() etc. But SearchCatCacheMiss() sets a flag that prevents the eager processing in RelationCacheInvalidate() - that avoids needing to rebuild relcache entries that aren't actually accessed as part of a cache miss repeatedly. I think just avoiding the repeated relcache rebuilds in SearchCatCacheMiss() would reduce runtime significantly, even if SearchCatCacheMiss() at the end would process that list of relcache invalidations. But I think it might not even be needed to achieve good coverage? It might be fine to defer processing of the pending list until the next RelationCacheInvalidate() triggered by a relation_open() outside of a catcache miss (or obviously until it is accessed next)? I think this scheme wouldn't just improve CCI performance, but importantly also normal invalidation processing. Right now we'll often re-build the same cache entry multiple times as part of a single ReceiveSharedInvalidMessages() as it's pretty common that a relation is the target of DDL in very close-by transactions. Greetings, Andres Freund
Hi, On 2021-05-11 19:02:00 -0700, Andres Freund wrote: > I don't think we'd loose a lot of practical coverage if we avoided > rebuilding non-accessed relcache entries eagerly during cache > lookups. What coverage do we e.g. gain by having a single > SearchCatCacheMiss() triggering rebuilding the relcache of a user > defined table several times? > > The InvalidateSystemCaches() marks all catcache entries as invalid. The > next catcache lookup will thus trigger a cache miss. That cache miss > will typically at least open the previously not locked relation + index > the cache is over. Each of those relation opens will fire off another > InvalidateSystemCaches(). Which will rebuild all the surviving relcache > entries at least twice - despite never being accessed in that path. This is actually worse than I described here, and I think it may point towards a relatively minimal change that'd improve performance of debug_invalidate_system_caches_always=1 substantially. Putting in some instrumentation I noticed that with debug_invalidate_system_caches_always=1 a single "top level" SearchCatCacheMiss() triggers up to a 100 RelationCacheInvalidate(). There's two levels to it: The table_open/index_open done as part of a SearchCatCacheMiss() will each trigger a invalidation of their own. But what then drives that up much further is that the RelationCacheInvalidate() will destroy the relcache entries for nearly all indexes and for pg_amop etc and *not* rebuild them as part of RelationCacheInvalidate() - there are no references. Which means that the index_open() on whatever index the syscache uses builds a new relache entry. Which then needs to do a RelationInitIndexAccessInfo() on that index. Which triggers a lot of syscache lookups. Which in turn need to build pg_omop etc. Which trigggers RelationCacheInvalidate() over and over. In essence, debug_invalidate_system_caches_always=1 in some important aspects behaves like debug_invalidate_system_caches_always=3, due to the syscache involvement. I think it's worth testing that we actually deal with everything possible being invalidated as part of a syscache lookup, but I don't think we learn a ton doing that for the whole build. Particularly when it prevents us from actually testing more interesting invalidation scenarios? What about having a mode where each "nesting" level of SearchCatCacheMiss allows only one interior InvalidateSystemCaches()? Here's an example stacktrace showing three nested syscache lookups: #0 SearchCatCacheMiss (cache=0x55aed2c34e00, nkeys=2, hashValue=3953514454, hashIndex=86, v1=2656, v2=2, v3=0, v4=0) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1329 #1 0x000055aed0edf194 in SearchCatCacheInternal (cache=0x55aed2c34e00, nkeys=2, v1=2656, v2=2, v3=0, v4=0) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1301 #2 0x000055aed0edeeb8 in SearchCatCache2 (cache=0x55aed2c34e00, v1=2656, v2=2) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1177 #3 0x000055aed0efc9b6 in SearchSysCache2 (cacheId=7, key1=2656, key2=2) at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1145 #4 0x000055aed0ee46aa in get_attoptions (relid=2656, attnum=2) at /home/andres/src/postgresql/src/backend/utils/cache/lsyscache.c:1002 #5 0x000055aed0ef7ebd in RelationGetIndexAttOptions (relation=0x7f873ad21700, copy=false) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:5734 #6 0x000055aed0eefc95 in RelationInitIndexAccessInfo (relation=0x7f873ad21700) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1522 #7 0x000055aed0eee927 in RelationBuildDesc (targetRelId=2656, insertIt=true) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1194 #8 0x000055aed0ef09ea in RelationIdGetRelation (relationId=2656) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2064 #9 0x000055aed083a95d in relation_open (relationId=2656, lockmode=1) at /home/andres/src/postgresql/src/backend/access/common/relation.c:59 #10 0x000055aed08c77af in index_open (relationId=2656, lockmode=1) at /home/andres/src/postgresql/src/backend/access/index/indexam.c:136 #11 0x000055aed08c6be4 in systable_beginscan (heapRelation=0x7f873ad1ec60, indexId=2656, indexOK=true, snapshot=0x0, nkeys=1,key=0x7ffdff557420) at /home/andres/src/postgresql/src/backend/access/index/genam.c:395 #12 0x000055aed0ef436b in AttrDefaultFetch (relation=0x7f873ad1e830, ndef=1) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:4422 #13 0x000055aed0eed6dc in RelationBuildTupleDesc (relation=0x7f873ad1e830) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:689 #14 0x000055aed0eee737 in RelationBuildDesc (targetRelId=16385, insertIt=false) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1147 #15 0x000055aed0ef16e4 in RelationClearRelation (relation=0x7f873ad1c728, rebuild=true) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2592 #16 0x000055aed0ef2391 in RelationCacheInvalidate () at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:3047 #17 0x000055aed0ee2218 in InvalidateSystemCaches () at /home/andres/src/postgresql/src/backend/utils/cache/inval.c:657 #18 0x000055aed0ee230b in AcceptInvalidationMessages () at /home/andres/src/postgresql/src/backend/utils/cache/inval.c:725 #19 0x000055aed0d35204 in LockRelationOid (relid=2610, lockmode=1) at /home/andres/src/postgresql/src/backend/storage/lmgr/lmgr.c:137 #20 0x000055aed083a953 in relation_open (relationId=2610, lockmode=1) at /home/andres/src/postgresql/src/backend/access/common/relation.c:56 #21 0x000055aed0913b73 in table_open (relationId=2610, lockmode=1) at /home/andres/src/postgresql/src/backend/access/table/table.c:43 #22 0x000055aed0edf2be in SearchCatCacheMiss (cache=0x55aed2c3dc80, nkeys=1, hashValue=1574576467, hashIndex=19, v1=2696,v2=0, v3=0, v4=0) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1365 #23 0x000055aed0edf194 in SearchCatCacheInternal (cache=0x55aed2c3dc80, nkeys=1, v1=2696, v2=0, v3=0, v4=0) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1301 #24 0x000055aed0edee7d in SearchCatCache1 (cache=0x55aed2c3dc80, v1=2696) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1169 #25 0x000055aed0efc8dd in SearchSysCache1 (cacheId=32, key1=2696) at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1134 #26 0x000055aed0eeef67 in RelationInitIndexAccessInfo (relation=0x7f873ad1e160) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1401 #27 0x000055aed0eee927 in RelationBuildDesc (targetRelId=2696, insertIt=true) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1194 #28 0x000055aed0ef09ea in RelationIdGetRelation (relationId=2696) at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2064 #29 0x000055aed083a95d in relation_open (relationId=2696, lockmode=1) at /home/andres/src/postgresql/src/backend/access/common/relation.c:59 #30 0x000055aed08c77af in index_open (relationId=2696, lockmode=1) at /home/andres/src/postgresql/src/backend/access/index/indexam.c:136 #31 0x000055aed08c6be4 in systable_beginscan (heapRelation=0x7f873ad21040, indexId=2696, indexOK=true, snapshot=0x0, nkeys=3,key=0x7ffdff557f60) at /home/andres/src/postgresql/src/backend/access/index/genam.c:395 #32 0x000055aed0edf30f in SearchCatCacheMiss (cache=0x55aed2c49380, nkeys=3, hashValue=1153660433, hashIndex=17, v1=16385,v2=1, v3=0, v4=0) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1367 #33 0x000055aed0edf194 in SearchCatCacheInternal (cache=0x55aed2c49380, nkeys=3, v1=16385, v2=1, v3=0, v4=0) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1301 #34 0x000055aed0edeef8 in SearchCatCache3 (cache=0x55aed2c49380, v1=16385, v2=1, v3=0) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1185 #35 0x000055aed0efca94 in SearchSysCache3 (cacheId=59, key1=16385, key2=1, key3=0) at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1156 #36 0x000055aed0ee75eb in get_attavgwidth (relid=16385, attnum=1) at /home/andres/src/postgresql/src/backend/utils/cache/lsyscache.c:3116 Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > In essence, debug_invalidate_system_caches_always=1 in some important aspects > behaves like debug_invalidate_system_caches_always=3, due to the syscache > involvement. Yeah. I think it's important to test those recursive invalidation scenarios, but it could likely be done more selectively. > What about having a mode where each "nesting" level of SearchCatCacheMiss > allows only one interior InvalidateSystemCaches()? An idea I'd been toying with was to make invals probabilistic, that is there would be X% chance of an inval being forced at any particular opportunity. Then you could dial X up or down to make a tradeoff between speed and the extent of coverage you get from a single run. (Over time, you could expect pretty complete coverage even with X not very close to 1, I think.) This could be extended to what you're thinking about by reducing X (according to some rule or other) for each level of cache-flush recursion. The argument to justify that is that recursive cache flushes are VERY repetitive, so that even a small probability will add up to full coverage of those code paths fairly quickly. I've not worked out the math to justify any specific proposal along this line, though. regards, tom lane
Hi, On 2021-05-14 16:53:16 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > In essence, debug_invalidate_system_caches_always=1 in some important aspects > > behaves like debug_invalidate_system_caches_always=3, due to the syscache > > involvement. > > Yeah. I think it's important to test those recursive invalidation > scenarios, but it could likely be done more selectively. Agreed. I wonder if the logic could be something like indicating that we don't invalidate due to pg_class/attribute/am/... (a set of super common system catalogs) being opened, iff that open is at the "top level". So we'd e.g. not trigger invalidation for a syscache miss scanning pg_class, unless the miss happens during a relcache build. But we would continue to trigger invalidations without further checks if e.g. pg_subscription is opened. > > What about having a mode where each "nesting" level of SearchCatCacheMiss > > allows only one interior InvalidateSystemCaches()? > > An idea I'd been toying with was to make invals probabilistic, that is > there would be X% chance of an inval being forced at any particular > opportunity. Then you could dial X up or down to make a tradeoff > between speed and the extent of coverage you get from a single run. > (Over time, you could expect pretty complete coverage even with X > not very close to 1, I think.) > > This could be extended to what you're thinking about by reducing X > (according to some rule or other) for each level of cache-flush > recursion. The argument to justify that is that recursive cache > flushes are VERY repetitive, so that even a small probability will > add up to full coverage of those code paths fairly quickly. That'd make sense, I've been wondering about something similar. But I'm a bit worried about that making it harder to reproduce problems reliably? > I've not worked out the math to justify any specific proposal > along this line, though. FWIW, I've prototyped the idea of only invalidating once for each syscache level, and it does reduce runtime of CREATE TABLE blarg_{0,1,2,3}(id serial primary key); SET debug_invalidate_system_caches_always = 1; SELECT * FROM blarg_0 join blarg_1 USING (id) join blarg_2 using (id) JOIN blarg_3 USING(id); RESET ALL; from 7.5s to 4.7s. The benefits are smaller when fewer tables are accessed, and larger if more (surprising, right :)). Greetings, Andres Freund
On 2021-May-14, Tom Lane wrote: > An idea I'd been toying with was to make invals probabilistic, that is > there would be X% chance of an inval being forced at any particular > opportunity. Then you could dial X up or down to make a tradeoff > between speed and the extent of coverage you get from a single run. > (Over time, you could expect pretty complete coverage even with X > not very close to 1, I think.) Maybe we could say that debug_invalidate_system_caches_always=2 means to use the current behavior, and debug_invalidate_system_caches_always=1 uses some probabilistic rule? -- Álvaro Herrera Valdivia, Chile
Andres Freund <andres@anarazel.de> writes: > On 2021-05-14 16:53:16 -0400, Tom Lane wrote: >> An idea I'd been toying with was to make invals probabilistic, that is >> there would be X% chance of an inval being forced at any particular >> opportunity. Then you could dial X up or down to make a tradeoff >> between speed and the extent of coverage you get from a single run. >> (Over time, you could expect pretty complete coverage even with X >> not very close to 1, I think.) > That'd make sense, I've been wondering about something similar. But I'm > a bit worried about that making it harder to reproduce problems > reliably? Once you know or suspect a problem, you dial X up to 1 and wait. regards, tom lane
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2021-May-14, Tom Lane wrote: >> An idea I'd been toying with was to make invals probabilistic, that is >> there would be X% chance of an inval being forced at any particular >> opportunity. Then you could dial X up or down to make a tradeoff >> between speed and the extent of coverage you get from a single run. >> (Over time, you could expect pretty complete coverage even with X >> not very close to 1, I think.) > Maybe we could say that debug_invalidate_system_caches_always=2 means to > use the current behavior, and debug_invalidate_system_caches_always=1 > uses some probabilistic rule? What I had in mind was to replace the boolean with an actual fraction. Probability zero is the non-debug behavior, and probability one gives you the same result as CLOBBER_CACHE_ALWAYS, and values in between give you tradeoffs. But I'm not sure exactly how to extend that to the recursive cases. regards, tom lane