Thread: stats_ext test fails with -DCATCACHE_FORCE_RELEASE
Hi. While playing around with a -DCATCACHE_FORCE_RELEASE build, I noticed that stats_ext test failed with errors for multiple statements that looked like this: ERROR: invalid ndistinct magic 7f7f7f7f (expected a352bfa4) I figured it's because statext_dependencies_load() and statext_ndistinct_build() both return a pointer that points directly into a pg_statistics_ext tuple obtained by using SearchSysCache1 that has uncertain lifetime after subsequent call to ReleaseSysCache before returning. I think we should be calling statext_dependencies_deserialize() and statext_ndistinct_deserialize(), respectively, *before* we perform ReleaseSysCache on the tuple. Attached patch does that. Thanks, Amit
Attachment
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > While playing around with a -DCATCACHE_FORCE_RELEASE build, I noticed that > stats_ext test failed with errors for multiple statements that looked like > this: > ERROR: invalid ndistinct magic 7f7f7f7f (expected a352bfa4) Interesting. How come the buildfarm hasn't noticed this? I should think that the CLOBBER_CACHE_ALWAYS animals, as well as the one(s) using -DCATCACHE_FORCE_RELEASE, would have shown failures. regards, tom lane
On 2018/05/02 0:33, Tom Lane wrote: > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >> While playing around with a -DCATCACHE_FORCE_RELEASE build, I noticed that >> stats_ext test failed with errors for multiple statements that looked like >> this: >> ERROR: invalid ndistinct magic 7f7f7f7f (expected a352bfa4) > > Interesting. How come the buildfarm hasn't noticed this? I should > think that the CLOBBER_CACHE_ALWAYS animals, as well as the one(s) > using -DCATCACHE_FORCE_RELEASE, would have shown failures. I too wondered why. Fwiw, similar failure occurs in PG 10 branch. Thanks, Amit
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2018/05/02 0:33, Tom Lane wrote: >> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >>> While playing around with a -DCATCACHE_FORCE_RELEASE build, I noticed that >>> stats_ext test failed with errors for multiple statements that looked like >>> this: >>> ERROR: invalid ndistinct magic 7f7f7f7f (expected a352bfa4) >> Interesting. How come the buildfarm hasn't noticed this? I should >> think that the CLOBBER_CACHE_ALWAYS animals, as well as the one(s) >> using -DCATCACHE_FORCE_RELEASE, would have shown failures. > I too wondered why. Fwiw, similar failure occurs in PG 10 branch. Ah, after looking closer I understand that. First, there isn't any buildfarm member using CATCACHE_FORCE_RELEASE --- what I was thinking of is Andrew's prion, which uses RELCACHE_FORCE_RELEASE. Not the same thing. Second, the nature of the bug is that these functions are reading from a catcache entry immediately after ReleaseSysCache, when they should do that immediately before. CLOBBER_CACHE_ALWAYS does not trigger the problem because it clobbers cache only at invalidation opportunities. In the current implementation, ReleaseSysCache per se is not an invalidation opportunity. CATCACHE_FORCE_RELEASE does model a real-world hazard, which is that if we get an invalidation signal for a catcache item *while it's pinned*, it'd go away as soon as the last pin is released. Evidently, these code paths do not contain any invalidation opportunities occurring while the pin on the stats_ext catcache entry is already held, so CCA can't trigger the problem. I think this means that there's no production hazard here, just a violation of coding convention. Nonetheless, we certainly should fix it, since it's easy to imagine future changes that would create a live hazard of the tuple going away during the ReleaseSysCache call. tl;dr: we lack buildfarm coverage of CATCACHE_FORCE_RELEASE. This is probably bad. It might be okay to just add that to prion's configuration; I'm not sure whether there's any value in testing it separately from RELCACHE_FORCE_RELEASE. regards, tom lane
On Wed, May 2, 2018 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > tl;dr: we lack buildfarm coverage of CATCACHE_FORCE_RELEASE. > This is probably bad. It might be okay to just add that to > prion's configuration; Will do that pronto. cheers andre -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services