Re: stats_ext test fails with -DCATCACHE_FORCE_RELEASE - Mailing list pgsql-hackers

From Tom Lane
Subject Re: stats_ext test fails with -DCATCACHE_FORCE_RELEASE
Date
Msg-id 32282.1525276109@sss.pgh.pa.us
Whole thread Raw
In response to Re: stats_ext test fails with -DCATCACHE_FORCE_RELEASE  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: stats_ext test fails with -DCATCACHE_FORCE_RELEASE  (Andrew Dunstan <andrew.dunstan@2ndquadrant.com>)
List pgsql-hackers
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> On 2018/05/02 0:33, Tom Lane wrote:
>> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
>>> While playing around with a -DCATCACHE_FORCE_RELEASE build, I noticed that
>>> stats_ext test failed with errors for multiple statements that looked like
>>> this:
>>> ERROR:  invalid ndistinct magic 7f7f7f7f (expected a352bfa4)

>> Interesting.  How come the buildfarm hasn't noticed this?  I should
>> think that the CLOBBER_CACHE_ALWAYS animals, as well as the one(s)
>> using -DCATCACHE_FORCE_RELEASE, would have shown failures.

> I too wondered why.  Fwiw, similar failure occurs in PG 10 branch.

Ah, after looking closer I understand that.  First, there isn't any
buildfarm member using CATCACHE_FORCE_RELEASE --- what I was thinking
of is Andrew's prion, which uses RELCACHE_FORCE_RELEASE.  Not the
same thing.

Second, the nature of the bug is that these functions are reading
from a catcache entry immediately after ReleaseSysCache, when they
should do that immediately before.  CLOBBER_CACHE_ALWAYS does not
trigger the problem because it clobbers cache only at invalidation
opportunities.  In the current implementation, ReleaseSysCache per
se is not an invalidation opportunity.

CATCACHE_FORCE_RELEASE does model a real-world hazard, which is
that if we get an invalidation signal for a catcache item *while
it's pinned*, it'd go away as soon as the last pin is released.
Evidently, these code paths do not contain any invalidation
opportunities occurring while the pin on the stats_ext catcache
entry is already held, so CCA can't trigger the problem.  I think
this means that there's no production hazard here, just a violation
of coding convention.  Nonetheless, we certainly should fix it,
since it's easy to imagine future changes that would create a live
hazard of the tuple going away during the ReleaseSysCache call.

tl;dr: we lack buildfarm coverage of CATCACHE_FORCE_RELEASE.
This is probably bad.  It might be okay to just add that to
prion's configuration; I'm not sure whether there's any value
in testing it separately from RELCACHE_FORCE_RELEASE.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Sort performance cliff with small work_mem
Next
From: Peter Eisentraut
Date:
Subject: Re: Remove mention in docs that foreign keys on partitioned tablesare not supported