Thread: Recovering from detoast-related catcache invalidations
In bug #18163 [1], Alexander proved the misgivings I had in [2] about catcache detoasting being possibly unsafe: >> BTW, while nosing around I found what seems like a very nasty related >> bug. Suppose that a catalog tuple being loaded into syscache contains >> some toasted fields. CatalogCacheCreateEntry will flatten the tuple, >> involving fetches from toast tables that will certainly cause >> AcceptInvalidationMessages calls. What if one of those should have >> invalidated this tuple? We will not notice, because it's not in >> the hashtable yet. When we do add it, we will mark it not-dead, >> meaning that the stale entry looks fine and could persist for a long >> while. Attached is a POC patch for fixing this. The idea is that if we get an invalidation while trying to detoast a catalog tuple, we should go around and re-read the tuple a second time to get an up-to-date version (or, possibly, discover that it no longer exists). In the case of SearchCatCacheList, we have to drop and reload the entire catcache list, but fortunately not a lot of new code is needed. The detection of "get an invalidation" could be refined: what I did here is to check for any advance of SharedInvalidMessageCounter, which clearly will have a significant number of false positives. However, the only way I see to make that a lot better is to temporarily create a placeholder catcache entry (probably a negative one) with the same keys, and then see if it got marked dead. This seems a little expensive, plus I'm afraid that it'd be actively wrong in the recursive-lookup cases that the existing comment in SearchCatCacheMiss is talking about (that is, the catcache entry might mislead any recursive lookup that happens). Moreover, if we did do something like that then the new code paths would be essentially untested. As the patch stands, the reload path seems to get taken 10 to 20 times during a "make installcheck-parallel" run of the core regression tests (out of about 150 times that catcache detoasting is required). Probably all of those are false-positive cases, but at least they're exercising the logic. So I'm inclined to leave it like this, but perhaps somebody else will have a different opinion. (BTW, there's a fair amount of existing catcache.c code that will need to be indented another tab stop, but in the interests of keeping the patch legible I didn't do that yet.) Comments? regards, tom lane [1] https://www.postgresql.org/message-id/18163-859bad19a43edcf6%40postgresql.org [2] https://www.postgresql.org/message-id/1389919.1697144487%40sss.pgh.pa.us diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 1aacb736c2..8255162a72 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -1319,6 +1319,7 @@ SearchCatCacheMiss(CatCache *cache, SysScanDesc scandesc; HeapTuple ntp; CatCTup *ct; + bool stale; Datum arguments[CATCACHE_MAXKEYS]; /* Initialize local parameter array */ @@ -1327,16 +1328,6 @@ SearchCatCacheMiss(CatCache *cache, arguments[2] = v3; arguments[3] = v4; - /* - * Ok, need to make a lookup in the relation, copy the scankey and fill - * out any per-call fields. - */ - memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys); - cur_skey[0].sk_argument = v1; - cur_skey[1].sk_argument = v2; - cur_skey[2].sk_argument = v3; - cur_skey[3].sk_argument = v4; - /* * Tuple was not found in cache, so we have to try to retrieve it directly * from the relation. If found, we will add it to the cache; if not @@ -1351,9 +1342,28 @@ SearchCatCacheMiss(CatCache *cache, * will eventually age out of the cache, so there's no functional problem. * This case is rare enough that it's not worth expending extra cycles to * detect. + * + * Another case, which we *must* handle, is that the tuple could become + * outdated during CatalogCacheCreateEntry's attempt to detoast it (since + * AcceptInvalidationMessages can run during TOAST table access). We do + * not want to return already-stale catcache entries, so we loop around + * and do the table scan again if that happens. */ relation = table_open(cache->cc_reloid, AccessShareLock); + do + { + /* + * Ok, need to make a lookup in the relation, copy the scankey and + * fill out any per-call fields. (We must re-do this when retrying, + * because systable_beginscan scribbles on the scankey.) + */ + memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys); + cur_skey[0].sk_argument = v1; + cur_skey[1].sk_argument = v2; + cur_skey[2].sk_argument = v3; + cur_skey[3].sk_argument = v4; + scandesc = systable_beginscan(relation, cache->cc_indexoid, IndexScanOK(cache, cur_skey), @@ -1362,12 +1372,19 @@ SearchCatCacheMiss(CatCache *cache, cur_skey); ct = NULL; + stale = false; while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) { ct = CatalogCacheCreateEntry(cache, ntp, arguments, hashValue, hashIndex, false); + /* upon failure, we must start the scan over */ + if (ct == NULL) + { + stale = true; + break; + } /* immediately set the refcount to 1 */ ResourceOwnerEnlargeCatCacheRefs(CurrentResourceOwner); ct->refcount++; @@ -1376,6 +1393,7 @@ SearchCatCacheMiss(CatCache *cache, } systable_endscan(scandesc); + } while (stale); table_close(relation, AccessShareLock); @@ -1398,6 +1416,9 @@ SearchCatCacheMiss(CatCache *cache, hashValue, hashIndex, true); + /* Creating a negative cache entry shouldn't fail */ + Assert(ct != NULL); + CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples", cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup); CACHE_elog(DEBUG2, "SearchCatCache(%s): put neg entry in bucket %d", @@ -1607,6 +1628,7 @@ SearchCatCacheList(CatCache *cache, */ ResourceOwnerEnlargeCatCacheListRefs(CurrentResourceOwner); + /* ctlist must be valid throughout the PG_TRY block */ ctlist = NIL; PG_TRY(); @@ -1614,19 +1636,23 @@ SearchCatCacheList(CatCache *cache, ScanKeyData cur_skey[CATCACHE_MAXKEYS]; Relation relation; SysScanDesc scandesc; - - /* - * Ok, need to make a lookup in the relation, copy the scankey and - * fill out any per-call fields. - */ - memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys); - cur_skey[0].sk_argument = v1; - cur_skey[1].sk_argument = v2; - cur_skey[2].sk_argument = v3; - cur_skey[3].sk_argument = v4; + bool stale; relation = table_open(cache->cc_reloid, AccessShareLock); + do + { + /* + * Ok, need to make a lookup in the relation, copy the scankey and + * fill out any per-call fields. (We must re-do this when + * retrying, because systable_beginscan scribbles on the scankey.) + */ + memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys); + cur_skey[0].sk_argument = v1; + cur_skey[1].sk_argument = v2; + cur_skey[2].sk_argument = v3; + cur_skey[3].sk_argument = v4; + scandesc = systable_beginscan(relation, cache->cc_indexoid, IndexScanOK(cache, cur_skey), @@ -1637,6 +1663,8 @@ SearchCatCacheList(CatCache *cache, /* The list will be ordered iff we are doing an index scan */ ordered = (scandesc->irel != NULL); + stale = false; + while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) { uint32 hashValue; @@ -1682,6 +1710,30 @@ SearchCatCacheList(CatCache *cache, ct = CatalogCacheCreateEntry(cache, ntp, arguments, hashValue, hashIndex, false); + /* upon failure, we must start the scan over */ + if (ct == NULL) + { + /* + * Release refcounts on any items we already had. We dare + * not try to free them if they're now unreferenced, since + * an error while doing that would result in the PG_CATCH + * below doing extra refcount decrements. Besides, we'll + * likely re-adopt those items in the next iteration, so + * it's not worth complicating matters to try to get rid + * of them. + */ + foreach(ctlist_item, ctlist) + { + ct = (CatCTup *) lfirst(ctlist_item); + Assert(ct->c_list == NULL); + Assert(ct->refcount > 0); + ct->refcount--; + } + /* Reset ctlist in preparation for new try */ + ctlist = NIL; + stale = true; + break; + } } /* Careful here: add entry to ctlist, then bump its refcount */ @@ -1691,6 +1743,7 @@ SearchCatCacheList(CatCache *cache, } systable_endscan(scandesc); + } while (stale); table_close(relation, AccessShareLock); @@ -1797,6 +1850,10 @@ ReleaseCatCacheList(CatCList *list) * CatalogCacheCreateEntry * Create a new CatCTup entry, copying the given HeapTuple and other * supplied data into it. The new entry initially has refcount 0. + * + * Returns NULL if we attempt to detoast the tuple and observe that it + * became stale. (This cannot happen for a negative entry.) Caller must + * retry the tuple lookup in that case. */ static CatCTup * CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, @@ -1820,9 +1877,24 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, * entry, and also protects us against the possibility of the toast * tuples being freed before we attempt to fetch them, in case of * something using a slightly stale catcache entry. + * + * If we see SharedInvalidMessageCounter advance during detoasting, + * assume that the tuple we are interested in might have been + * invalidated, and report failure. (Since we don't yet have a + * catcache entry representing the tuple, it's impossible to tell for + * sure, and we have to be conservative.) */ if (HeapTupleHasExternal(ntp)) + { + uint64 inval_count = SharedInvalidMessageCounter; + dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc); + if (inval_count != SharedInvalidMessageCounter) + { + heap_freetuple(dtp); + return NULL; + } + } else dtp = ntp;
I wrote: > In bug #18163 [1], Alexander proved the misgivings I had in [2] > about catcache detoasting being possibly unsafe: > ... > Attached is a POC patch for fixing this. The cfbot pointed out that this needed a rebase. No substantive changes. regards, tom lane diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 2e2e4d9f1f..2d72e8106c 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -1372,6 +1372,7 @@ SearchCatCacheMiss(CatCache *cache, SysScanDesc scandesc; HeapTuple ntp; CatCTup *ct; + bool stale; Datum arguments[CATCACHE_MAXKEYS]; /* Initialize local parameter array */ @@ -1380,16 +1381,6 @@ SearchCatCacheMiss(CatCache *cache, arguments[2] = v3; arguments[3] = v4; - /* - * Ok, need to make a lookup in the relation, copy the scankey and fill - * out any per-call fields. - */ - memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys); - cur_skey[0].sk_argument = v1; - cur_skey[1].sk_argument = v2; - cur_skey[2].sk_argument = v3; - cur_skey[3].sk_argument = v4; - /* * Tuple was not found in cache, so we have to try to retrieve it directly * from the relation. If found, we will add it to the cache; if not @@ -1404,9 +1395,28 @@ SearchCatCacheMiss(CatCache *cache, * will eventually age out of the cache, so there's no functional problem. * This case is rare enough that it's not worth expending extra cycles to * detect. + * + * Another case, which we *must* handle, is that the tuple could become + * outdated during CatalogCacheCreateEntry's attempt to detoast it (since + * AcceptInvalidationMessages can run during TOAST table access). We do + * not want to return already-stale catcache entries, so we loop around + * and do the table scan again if that happens. */ relation = table_open(cache->cc_reloid, AccessShareLock); + do + { + /* + * Ok, need to make a lookup in the relation, copy the scankey and + * fill out any per-call fields. (We must re-do this when retrying, + * because systable_beginscan scribbles on the scankey.) + */ + memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys); + cur_skey[0].sk_argument = v1; + cur_skey[1].sk_argument = v2; + cur_skey[2].sk_argument = v3; + cur_skey[3].sk_argument = v4; + scandesc = systable_beginscan(relation, cache->cc_indexoid, IndexScanOK(cache, cur_skey), @@ -1415,12 +1425,19 @@ SearchCatCacheMiss(CatCache *cache, cur_skey); ct = NULL; + stale = false; while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) { ct = CatalogCacheCreateEntry(cache, ntp, arguments, hashValue, hashIndex, false); + /* upon failure, we must start the scan over */ + if (ct == NULL) + { + stale = true; + break; + } /* immediately set the refcount to 1 */ ResourceOwnerEnlarge(CurrentResourceOwner); ct->refcount++; @@ -1429,6 +1446,7 @@ SearchCatCacheMiss(CatCache *cache, } systable_endscan(scandesc); + } while (stale); table_close(relation, AccessShareLock); @@ -1451,6 +1469,9 @@ SearchCatCacheMiss(CatCache *cache, hashValue, hashIndex, true); + /* Creating a negative cache entry shouldn't fail */ + Assert(ct != NULL); + CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples", cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup); CACHE_elog(DEBUG2, "SearchCatCache(%s): put neg entry in bucket %d", @@ -1663,7 +1684,8 @@ SearchCatCacheList(CatCache *cache, * We have to bump the member refcounts temporarily to ensure they won't * get dropped from the cache while loading other members. We use a PG_TRY * block to ensure we can undo those refcounts if we get an error before - * we finish constructing the CatCList. + * we finish constructing the CatCList. ctlist must be valid throughout + * the PG_TRY block. */ ctlist = NIL; @@ -1672,19 +1694,23 @@ SearchCatCacheList(CatCache *cache, ScanKeyData cur_skey[CATCACHE_MAXKEYS]; Relation relation; SysScanDesc scandesc; - - /* - * Ok, need to make a lookup in the relation, copy the scankey and - * fill out any per-call fields. - */ - memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys); - cur_skey[0].sk_argument = v1; - cur_skey[1].sk_argument = v2; - cur_skey[2].sk_argument = v3; - cur_skey[3].sk_argument = v4; + bool stale; relation = table_open(cache->cc_reloid, AccessShareLock); + do + { + /* + * Ok, need to make a lookup in the relation, copy the scankey and + * fill out any per-call fields. (We must re-do this when + * retrying, because systable_beginscan scribbles on the scankey.) + */ + memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys); + cur_skey[0].sk_argument = v1; + cur_skey[1].sk_argument = v2; + cur_skey[2].sk_argument = v3; + cur_skey[3].sk_argument = v4; + scandesc = systable_beginscan(relation, cache->cc_indexoid, IndexScanOK(cache, cur_skey), @@ -1695,6 +1721,8 @@ SearchCatCacheList(CatCache *cache, /* The list will be ordered iff we are doing an index scan */ ordered = (scandesc->irel != NULL); + stale = false; + while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) { uint32 hashValue; @@ -1740,6 +1768,30 @@ SearchCatCacheList(CatCache *cache, ct = CatalogCacheCreateEntry(cache, ntp, arguments, hashValue, hashIndex, false); + /* upon failure, we must start the scan over */ + if (ct == NULL) + { + /* + * Release refcounts on any items we already had. We dare + * not try to free them if they're now unreferenced, since + * an error while doing that would result in the PG_CATCH + * below doing extra refcount decrements. Besides, we'll + * likely re-adopt those items in the next iteration, so + * it's not worth complicating matters to try to get rid + * of them. + */ + foreach(ctlist_item, ctlist) + { + ct = (CatCTup *) lfirst(ctlist_item); + Assert(ct->c_list == NULL); + Assert(ct->refcount > 0); + ct->refcount--; + } + /* Reset ctlist in preparation for new try */ + ctlist = NIL; + stale = true; + break; + } } /* Careful here: add entry to ctlist, then bump its refcount */ @@ -1749,6 +1801,7 @@ SearchCatCacheList(CatCache *cache, } systable_endscan(scandesc); + } while (stale); table_close(relation, AccessShareLock); @@ -1865,6 +1918,10 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner) * CatalogCacheCreateEntry * Create a new CatCTup entry, copying the given HeapTuple and other * supplied data into it. The new entry initially has refcount 0. + * + * Returns NULL if we attempt to detoast the tuple and observe that it + * became stale. (This cannot happen for a negative entry.) Caller must + * retry the tuple lookup in that case. */ static CatCTup * CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, @@ -1888,9 +1945,24 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, * entry, and also protects us against the possibility of the toast * tuples being freed before we attempt to fetch them, in case of * something using a slightly stale catcache entry. + * + * If we see SharedInvalidMessageCounter advance during detoasting, + * assume that the tuple we are interested in might have been + * invalidated, and report failure. (Since we don't yet have a + * catcache entry representing the tuple, it's impossible to tell for + * sure, and we have to be conservative.) */ if (HeapTupleHasExternal(ntp)) + { + uint64 inval_count = SharedInvalidMessageCounter; + dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc); + if (inval_count != SharedInvalidMessageCounter) + { + heap_freetuple(dtp); + return NULL; + } + } else dtp = ntp;
Hi,
>> BTW, while nosing around I found what seems like a very nasty related
>> bug. Suppose that a catalog tuple being loaded into syscache contains
>> some toasted fields. CatalogCacheCreateEntry will flatten the tuple,
>> involving fetches from toast tables that will certainly cause
>> AcceptInvalidationMessages calls. What if one of those should have
>> invalidated this tuple? We will not notice, because it's not in
>> the hashtable yet. When we do add it, we will mark it not-dead,
>> meaning that the stale entry looks fine and could persist for a long
>> while.
I spent some time trying to understand the bug and finally, I can reproduce it locally with the following steps:
step1:
create a function called 'test' with a long body that must be stored in a toast table.
and put it in schema 'yy' by : "alter function test set schema yy";
step 2:
I added a breakpoint at 'toast_flatten_tuple' for session1 ,
then execute the following SQL:
----------
set search_path='public';
alter function test set schema xx;
----------
step 3:
when the session1 stops at the breakpoint, I open session2 and execute
-----------
set search_path = 'yy';
alter function test set schema public;
-----------
step4:
resume the session1 , it reports the error "ERROR: could not find a function named "test""
step 5:
continue to execute "alter function test set schema xx;" in session1, but it still can not work and report the above error although the function test already belongs to schema 'public'
Obviously, in session 1, the "test" proc tuple in the cache is outdated.
>> The detection of "get an invalidation" could be refined: what I did
>> here is to check for any advance of SharedInvalidMessageCounter,
>> which clearly will have a significant number of false positives.
>> However, the only way I see to make that a lot better is to
>> temporarily create a placeholder catcache entry (probably a negative
>> one) with the same keys, and then see if it got marked dead.
>> This seems a little expensive, plus I'm afraid that it'd be actively
>> wrong in the recursive-lookup cases that the existing comment in
>> SearchCatCacheMiss is talking about (that is, the catcache entry
>> might mislead any recursive lookup that happens).
>> here is to check for any advance of SharedInvalidMessageCounter,
>> which clearly will have a significant number of false positives.
>> However, the only way I see to make that a lot better is to
>> temporarily create a placeholder catcache entry (probably a negative
>> one) with the same keys, and then see if it got marked dead.
>> This seems a little expensive, plus I'm afraid that it'd be actively
>> wrong in the recursive-lookup cases that the existing comment in
>> SearchCatCacheMiss is talking about (that is, the catcache entry
>> might mislead any recursive lookup that happens).
I have reviewed your patch, and it looks good. But instead of checking for any advance of SharedInvalidMessageCounter ( if the invalidate message is not related to the current tuple, it is a little expensive) I have another idea: we can recheck the visibility of the tuple with CatalogSnapshot(the CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if it is not visible, we re-fetch the tuple, otherwise, we can continue to use it as it is not outdated.
I added a commit based on your patch and attached it.
Attachment
Xiaoran Wang <fanfuxiaoran@gmail.com> writes: >>> The detection of "get an invalidation" could be refined: what I did >>> here is to check for any advance of SharedInvalidMessageCounter, >>> which clearly will have a significant number of false positives. > I have reviewed your patch, and it looks good. But instead of checking for > any advance of SharedInvalidMessageCounter ( if the invalidate message is > not related to the current tuple, it is a little expensive) I have another > idea: we can recheck the visibility of the tuple with CatalogSnapshot(the > CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if > it is not visible, we re-fetch the tuple, otherwise, we can continue to use > it as it is not outdated. Maybe, but that undocumented hack in SetHintBits seems completely unacceptable. Isn't there a cleaner way to make this check? Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather than GetCatalogSnapshot is the right thing, because the catcaches use the latter. regards, tom lane
> Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
> than GetCatalogSnapshot is the right thing, because the catcaches
> use the latter.
> than GetCatalogSnapshot is the right thing, because the catcaches
> use the latter.
Yes, you are right, should use GetCatalogSnapshot here.
> Maybe, but that undocumented hack in SetHintBits seems completely
> unacceptable. Isn't there a cleaner way to make this check?
> unacceptable. Isn't there a cleaner way to make this check?
Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the tuple has been deleted.
As the tuple's xmin must been committed, so we just need to check if its xmax is committed,
like the below:
------------
@@ -1956,9 +1956,11 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
*/
if (HeapTupleHasExternal(ntp))
{
+ TransactionId xmax;
dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
- if (!HeapTupleSatisfiesVisibility(ntp, GetNonHistoricCatalogSnapshot(cache->cc_reloid), InvalidBuffer))
+ xmax = HeapTupleHeaderGetUpdateXid(ntp->t_data);
+ if (TransactionIdIsValid(xmax) && TransactionIdDidCommit(xmax))
{
heap_freetuple(dtp);
return NULL;
*/
if (HeapTupleHasExternal(ntp))
{
+ TransactionId xmax;
dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
- if (!HeapTupleSatisfiesVisibility(ntp, GetNonHistoricCatalogSnapshot(cache->cc_reloid), InvalidBuffer))
+ xmax = HeapTupleHeaderGetUpdateXid(ntp->t_data);
+ if (TransactionIdIsValid(xmax) && TransactionIdDidCommit(xmax))
{
heap_freetuple(dtp);
return NULL;
------------
I'm not quite sure the code is correct, I cannot clearly understand 'HeapTupleHeaderGetUpdateXid', and I need more time to dive into it.
Any thoughts?
Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月12日周五 06:21写道:
Xiaoran Wang <fanfuxiaoran@gmail.com> writes:
>>> The detection of "get an invalidation" could be refined: what I did
>>> here is to check for any advance of SharedInvalidMessageCounter,
>>> which clearly will have a significant number of false positives.
> I have reviewed your patch, and it looks good. But instead of checking for
> any advance of SharedInvalidMessageCounter ( if the invalidate message is
> not related to the current tuple, it is a little expensive) I have another
> idea: we can recheck the visibility of the tuple with CatalogSnapshot(the
> CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if
> it is not visible, we re-fetch the tuple, otherwise, we can continue to use
> it as it is not outdated.
Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable. Isn't there a cleaner way to make this check?
Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
than GetCatalogSnapshot is the right thing, because the catcaches
use the latter.
regards, tom lane
Xiaoran Wang <fanfuxiaoran@gmail.com> writes: >> Maybe, but that undocumented hack in SetHintBits seems completely >> unacceptable. Isn't there a cleaner way to make this check? > Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the > tuple has been deleted. > As the tuple's xmin must been committed, so we just need to check if its > xmax is committed, I'm not super thrilled with that. Something I realized last night is that your proposal only works if "ntp" is pointing directly into the catalog's disk buffers. If something earlier than this code had made a local-memory copy of the catalog tuple, then it's possible that its header fields (particularly xmax) are out of date compared to shared buffers and would fail to tell us that some other process just invalidated the tuple. Now in fact, with the current implementation of syscache_getnext() the result is actually a live tuple and so we can expect to see any relevant updates. But I think we'd better add some Asserts that that's so; and that also provides us with a way to call HeapTupleSatisfiesVisibility fully legally, because we can get the buffer reference out of the scan descriptor too. This is uncomfortably much in bed with the tuple table slot code, perhaps, but I don't see a way to do it more cleanly unless we want to add some new provisions to that API. Andres, do you have any thoughts about that? Anyway, this approach gets rid of false positives, which is great for performance and bad for testing. Code coverage says that now we never hit the failure paths during regression tests, which is unsurprising, but I'm not very comfortable with leaving those paths unexercised. I tried to make an isolation test to exercise them, but there's no good way at the SQL level to get a session to block during the detoast step. LOCK TABLE on some catalog's toast table would do, but we disallow it. I thought about adding a small C function to regress.so to take out such a lock, but we have no infrastructure for referencing regress.so from isolation tests. What I ended up doing is adding a random failure about 0.1% of the time in USE_ASSERT_CHECKING builds --- that's intellectually ugly for sure, but doing better seems like way more work than it's worth. regards, tom lane diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index c0591cffe5..b1ce7bf513 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -15,6 +15,7 @@ #include "postgres.h" #include "access/genam.h" +#include "access/heapam.h" #include "access/heaptoast.h" #include "access/relscan.h" #include "access/sysattr.h" @@ -24,8 +25,10 @@ #include "catalog/pg_operator.h" #include "catalog/pg_type.h" #include "common/hashfn.h" +#include "common/pg_prng.h" #include "miscadmin.h" #include "port/pg_bitutils.h" +#include "storage/bufmgr.h" #ifdef CATCACHE_STATS #include "storage/ipc.h" /* for on_proc_exit */ #endif @@ -90,10 +93,10 @@ static void CatCachePrintStats(int code, Datum arg); static void CatCacheRemoveCTup(CatCache *cache, CatCTup *ct); static void CatCacheRemoveCList(CatCache *cache, CatCList *cl); static void CatalogCacheInitializeCache(CatCache *cache); -static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, +static CatCTup *CatalogCacheCreateEntry(CatCache *cache, + HeapTuple ntp, SysScanDesc scandesc, Datum *arguments, - uint32 hashValue, Index hashIndex, - bool negative); + uint32 hashValue, Index hashIndex); static void ReleaseCatCacheWithOwner(HeapTuple tuple, ResourceOwner resowner); static void ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner); @@ -1372,6 +1375,7 @@ SearchCatCacheMiss(CatCache *cache, SysScanDesc scandesc; HeapTuple ntp; CatCTup *ct; + bool stale; Datum arguments[CATCACHE_MAXKEYS]; /* Initialize local parameter array */ @@ -1380,16 +1384,6 @@ SearchCatCacheMiss(CatCache *cache, arguments[2] = v3; arguments[3] = v4; - /* - * Ok, need to make a lookup in the relation, copy the scankey and fill - * out any per-call fields. - */ - memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys); - cur_skey[0].sk_argument = v1; - cur_skey[1].sk_argument = v2; - cur_skey[2].sk_argument = v3; - cur_skey[3].sk_argument = v4; - /* * Tuple was not found in cache, so we have to try to retrieve it directly * from the relation. If found, we will add it to the cache; if not @@ -1404,9 +1398,28 @@ SearchCatCacheMiss(CatCache *cache, * will eventually age out of the cache, so there's no functional problem. * This case is rare enough that it's not worth expending extra cycles to * detect. + * + * Another case, which we *must* handle, is that the tuple could become + * outdated during CatalogCacheCreateEntry's attempt to detoast it (since + * AcceptInvalidationMessages can run during TOAST table access). We do + * not want to return already-stale catcache entries, so we loop around + * and do the table scan again if that happens. */ relation = table_open(cache->cc_reloid, AccessShareLock); + do + { + /* + * Ok, need to make a lookup in the relation, copy the scankey and + * fill out any per-call fields. (We must re-do this when retrying, + * because systable_beginscan scribbles on the scankey.) + */ + memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys); + cur_skey[0].sk_argument = v1; + cur_skey[1].sk_argument = v2; + cur_skey[2].sk_argument = v3; + cur_skey[3].sk_argument = v4; + scandesc = systable_beginscan(relation, cache->cc_indexoid, IndexScanOK(cache, cur_skey), @@ -1415,12 +1428,18 @@ SearchCatCacheMiss(CatCache *cache, cur_skey); ct = NULL; + stale = false; while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) { - ct = CatalogCacheCreateEntry(cache, ntp, arguments, - hashValue, hashIndex, - false); + ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL, + hashValue, hashIndex); + /* upon failure, we must start the scan over */ + if (ct == NULL) + { + stale = true; + break; + } /* immediately set the refcount to 1 */ ResourceOwnerEnlarge(CurrentResourceOwner); ct->refcount++; @@ -1429,6 +1448,7 @@ SearchCatCacheMiss(CatCache *cache, } systable_endscan(scandesc); + } while (stale); table_close(relation, AccessShareLock); @@ -1447,9 +1467,11 @@ SearchCatCacheMiss(CatCache *cache, if (IsBootstrapProcessingMode()) return NULL; - ct = CatalogCacheCreateEntry(cache, NULL, arguments, - hashValue, hashIndex, - true); + ct = CatalogCacheCreateEntry(cache, NULL, NULL, arguments, + hashValue, hashIndex); + + /* Creating a negative cache entry shouldn't fail */ + Assert(ct != NULL); CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples", cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup); @@ -1663,7 +1685,8 @@ SearchCatCacheList(CatCache *cache, * We have to bump the member refcounts temporarily to ensure they won't * get dropped from the cache while loading other members. We use a PG_TRY * block to ensure we can undo those refcounts if we get an error before - * we finish constructing the CatCList. + * we finish constructing the CatCList. ctlist must be valid throughout + * the PG_TRY block. */ ctlist = NIL; @@ -1672,19 +1695,23 @@ SearchCatCacheList(CatCache *cache, ScanKeyData cur_skey[CATCACHE_MAXKEYS]; Relation relation; SysScanDesc scandesc; - - /* - * Ok, need to make a lookup in the relation, copy the scankey and - * fill out any per-call fields. - */ - memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys); - cur_skey[0].sk_argument = v1; - cur_skey[1].sk_argument = v2; - cur_skey[2].sk_argument = v3; - cur_skey[3].sk_argument = v4; + bool stale; relation = table_open(cache->cc_reloid, AccessShareLock); + do + { + /* + * Ok, need to make a lookup in the relation, copy the scankey and + * fill out any per-call fields. (We must re-do this when + * retrying, because systable_beginscan scribbles on the scankey.) + */ + memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys); + cur_skey[0].sk_argument = v1; + cur_skey[1].sk_argument = v2; + cur_skey[2].sk_argument = v3; + cur_skey[3].sk_argument = v4; + scandesc = systable_beginscan(relation, cache->cc_indexoid, IndexScanOK(cache, cur_skey), @@ -1695,6 +1722,8 @@ SearchCatCacheList(CatCache *cache, /* The list will be ordered iff we are doing an index scan */ ordered = (scandesc->irel != NULL); + stale = false; + while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) { uint32 hashValue; @@ -1737,9 +1766,32 @@ SearchCatCacheList(CatCache *cache, if (!found) { /* We didn't find a usable entry, so make a new one */ - ct = CatalogCacheCreateEntry(cache, ntp, arguments, - hashValue, hashIndex, - false); + ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL, + hashValue, hashIndex); + /* upon failure, we must start the scan over */ + if (ct == NULL) + { + /* + * Release refcounts on any items we already had. We dare + * not try to free them if they're now unreferenced, since + * an error while doing that would result in the PG_CATCH + * below doing extra refcount decrements. Besides, we'll + * likely re-adopt those items in the next iteration, so + * it's not worth complicating matters to try to get rid + * of them. + */ + foreach(ctlist_item, ctlist) + { + ct = (CatCTup *) lfirst(ctlist_item); + Assert(ct->c_list == NULL); + Assert(ct->refcount > 0); + ct->refcount--; + } + /* Reset ctlist in preparation for new try */ + ctlist = NIL; + stale = true; + break; + } } /* Careful here: add entry to ctlist, then bump its refcount */ @@ -1749,6 +1801,7 @@ SearchCatCacheList(CatCache *cache, } systable_endscan(scandesc); + } while (stale); table_close(relation, AccessShareLock); @@ -1865,22 +1918,41 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner) * CatalogCacheCreateEntry * Create a new CatCTup entry, copying the given HeapTuple and other * supplied data into it. The new entry initially has refcount 0. + * + * To create a normal cache entry, ntp must be the HeapTuple just fetched + * from scandesc, and "arguments" is not used. To create a negative cache + * entry, pass NULL for ntp and scandesc; then "arguments" is the cache + * keys to use. In either case, hashValue/hashIndex are the hash values + * computed from the cache keys. + * + * Returns NULL if we attempt to detoast the tuple and observe that it + * became stale. (This cannot happen for a negative entry.) Caller must + * retry the tuple lookup in that case. */ static CatCTup * -CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, - uint32 hashValue, Index hashIndex, - bool negative) +CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc, + Datum *arguments, + uint32 hashValue, Index hashIndex) { CatCTup *ct; HeapTuple dtp; MemoryContext oldcxt; - /* negative entries have no tuple associated */ if (ntp) { int i; - Assert(!negative); + /* + * The visibility check below essentially never fails during our + * regression tests, and there's no easy way to force it to fail for + * testing purposes. To ensure we have test coverage for the retry + * paths in our callers, make debug builds randomly fail about 0.1% of + * the times through this code path. + */ +#ifdef USE_ASSERT_CHECKING + if (pg_prng_uint32(&pg_global_prng_state) <= (PG_UINT32_MAX / 1000)) + return NULL; +#endif /* * If there are any out-of-line toasted fields in the tuple, expand @@ -1888,9 +1960,32 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, * entry, and also protects us against the possibility of the toast * tuples being freed before we attempt to fetch them, in case of * something using a slightly stale catcache entry. + * + * The tuple could become stale while we are doing toast table access + * (since AcceptInvalidationMessages can run then), so we must recheck + * its visibility afterwards. To do that, we must be undesirably + * familiar with the fact that systable_getnext() will return a direct + * pointer into a catalog disk buffer. This code is likely to need + * work if we ever want to support non-heap catalogs. */ if (HeapTupleHasExternal(ntp)) + { + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) scandesc->slot; + + /* Assert that we have a live pointer to a pinned catalog buffer */ + Assert(TTS_IS_BUFFERTUPLE(scandesc->slot)); + Assert(ntp == bslot->base.tuple); + Assert(BufferIsValid(bslot->buffer)); + dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc); + if (!HeapTupleSatisfiesVisibility(ntp, + GetCatalogSnapshot(cache->cc_reloid), + bslot->buffer)) + { + heap_freetuple(dtp); + return NULL; + } + } else dtp = ntp; @@ -1929,7 +2024,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, } else { - Assert(negative); + /* Set up keys for a negative cache entry */ oldcxt = MemoryContextSwitchTo(CacheMemoryContext); ct = (CatCTup *) palloc(sizeof(CatCTup)); @@ -1951,7 +2046,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, ct->c_list = NULL; ct->refcount = 0; /* for the moment */ ct->dead = false; - ct->negative = negative; + ct->negative = (ntp == NULL); ct->hash_value = hashValue; dlist_push_head(&cache->cc_bucket[hashIndex], &ct->cache_elem);
I wrote: > This is uncomfortably much in bed with the tuple table slot code, > perhaps, but I don't see a way to do it more cleanly unless we want > to add some new provisions to that API. Andres, do you have any > thoughts about that? Oh! After nosing around a bit more I remembered systable_recheck_tuple, which is meant for exactly this purpose. So v4 attached. regards, tom lane diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index c0591cffe5..0dcd45d2f0 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -24,6 +24,7 @@ #include "catalog/pg_operator.h" #include "catalog/pg_type.h" #include "common/hashfn.h" +#include "common/pg_prng.h" #include "miscadmin.h" #include "port/pg_bitutils.h" #ifdef CATCACHE_STATS @@ -90,10 +91,10 @@ static void CatCachePrintStats(int code, Datum arg); static void CatCacheRemoveCTup(CatCache *cache, CatCTup *ct); static void CatCacheRemoveCList(CatCache *cache, CatCList *cl); static void CatalogCacheInitializeCache(CatCache *cache); -static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, +static CatCTup *CatalogCacheCreateEntry(CatCache *cache, + HeapTuple ntp, SysScanDesc scandesc, Datum *arguments, - uint32 hashValue, Index hashIndex, - bool negative); + uint32 hashValue, Index hashIndex); static void ReleaseCatCacheWithOwner(HeapTuple tuple, ResourceOwner resowner); static void ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner); @@ -1372,6 +1373,7 @@ SearchCatCacheMiss(CatCache *cache, SysScanDesc scandesc; HeapTuple ntp; CatCTup *ct; + bool stale; Datum arguments[CATCACHE_MAXKEYS]; /* Initialize local parameter array */ @@ -1380,16 +1382,6 @@ SearchCatCacheMiss(CatCache *cache, arguments[2] = v3; arguments[3] = v4; - /* - * Ok, need to make a lookup in the relation, copy the scankey and fill - * out any per-call fields. - */ - memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys); - cur_skey[0].sk_argument = v1; - cur_skey[1].sk_argument = v2; - cur_skey[2].sk_argument = v3; - cur_skey[3].sk_argument = v4; - /* * Tuple was not found in cache, so we have to try to retrieve it directly * from the relation. If found, we will add it to the cache; if not @@ -1404,9 +1396,28 @@ SearchCatCacheMiss(CatCache *cache, * will eventually age out of the cache, so there's no functional problem. * This case is rare enough that it's not worth expending extra cycles to * detect. + * + * Another case, which we *must* handle, is that the tuple could become + * outdated during CatalogCacheCreateEntry's attempt to detoast it (since + * AcceptInvalidationMessages can run during TOAST table access). We do + * not want to return already-stale catcache entries, so we loop around + * and do the table scan again if that happens. */ relation = table_open(cache->cc_reloid, AccessShareLock); + do + { + /* + * Ok, need to make a lookup in the relation, copy the scankey and + * fill out any per-call fields. (We must re-do this when retrying, + * because systable_beginscan scribbles on the scankey.) + */ + memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys); + cur_skey[0].sk_argument = v1; + cur_skey[1].sk_argument = v2; + cur_skey[2].sk_argument = v3; + cur_skey[3].sk_argument = v4; + scandesc = systable_beginscan(relation, cache->cc_indexoid, IndexScanOK(cache, cur_skey), @@ -1415,12 +1426,18 @@ SearchCatCacheMiss(CatCache *cache, cur_skey); ct = NULL; + stale = false; while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) { - ct = CatalogCacheCreateEntry(cache, ntp, arguments, - hashValue, hashIndex, - false); + ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL, + hashValue, hashIndex); + /* upon failure, we must start the scan over */ + if (ct == NULL) + { + stale = true; + break; + } /* immediately set the refcount to 1 */ ResourceOwnerEnlarge(CurrentResourceOwner); ct->refcount++; @@ -1429,6 +1446,7 @@ SearchCatCacheMiss(CatCache *cache, } systable_endscan(scandesc); + } while (stale); table_close(relation, AccessShareLock); @@ -1447,9 +1465,11 @@ SearchCatCacheMiss(CatCache *cache, if (IsBootstrapProcessingMode()) return NULL; - ct = CatalogCacheCreateEntry(cache, NULL, arguments, - hashValue, hashIndex, - true); + ct = CatalogCacheCreateEntry(cache, NULL, NULL, arguments, + hashValue, hashIndex); + + /* Creating a negative cache entry shouldn't fail */ + Assert(ct != NULL); CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples", cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup); @@ -1663,7 +1683,8 @@ SearchCatCacheList(CatCache *cache, * We have to bump the member refcounts temporarily to ensure they won't * get dropped from the cache while loading other members. We use a PG_TRY * block to ensure we can undo those refcounts if we get an error before - * we finish constructing the CatCList. + * we finish constructing the CatCList. ctlist must be valid throughout + * the PG_TRY block. */ ctlist = NIL; @@ -1672,19 +1693,23 @@ SearchCatCacheList(CatCache *cache, ScanKeyData cur_skey[CATCACHE_MAXKEYS]; Relation relation; SysScanDesc scandesc; - - /* - * Ok, need to make a lookup in the relation, copy the scankey and - * fill out any per-call fields. - */ - memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys); - cur_skey[0].sk_argument = v1; - cur_skey[1].sk_argument = v2; - cur_skey[2].sk_argument = v3; - cur_skey[3].sk_argument = v4; + bool stale; relation = table_open(cache->cc_reloid, AccessShareLock); + do + { + /* + * Ok, need to make a lookup in the relation, copy the scankey and + * fill out any per-call fields. (We must re-do this when + * retrying, because systable_beginscan scribbles on the scankey.) + */ + memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys); + cur_skey[0].sk_argument = v1; + cur_skey[1].sk_argument = v2; + cur_skey[2].sk_argument = v3; + cur_skey[3].sk_argument = v4; + scandesc = systable_beginscan(relation, cache->cc_indexoid, IndexScanOK(cache, cur_skey), @@ -1695,6 +1720,8 @@ SearchCatCacheList(CatCache *cache, /* The list will be ordered iff we are doing an index scan */ ordered = (scandesc->irel != NULL); + stale = false; + while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) { uint32 hashValue; @@ -1737,9 +1764,32 @@ SearchCatCacheList(CatCache *cache, if (!found) { /* We didn't find a usable entry, so make a new one */ - ct = CatalogCacheCreateEntry(cache, ntp, arguments, - hashValue, hashIndex, - false); + ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL, + hashValue, hashIndex); + /* upon failure, we must start the scan over */ + if (ct == NULL) + { + /* + * Release refcounts on any items we already had. We dare + * not try to free them if they're now unreferenced, since + * an error while doing that would result in the PG_CATCH + * below doing extra refcount decrements. Besides, we'll + * likely re-adopt those items in the next iteration, so + * it's not worth complicating matters to try to get rid + * of them. + */ + foreach(ctlist_item, ctlist) + { + ct = (CatCTup *) lfirst(ctlist_item); + Assert(ct->c_list == NULL); + Assert(ct->refcount > 0); + ct->refcount--; + } + /* Reset ctlist in preparation for new try */ + ctlist = NIL; + stale = true; + break; + } } /* Careful here: add entry to ctlist, then bump its refcount */ @@ -1749,6 +1799,7 @@ SearchCatCacheList(CatCache *cache, } systable_endscan(scandesc); + } while (stale); table_close(relation, AccessShareLock); @@ -1865,22 +1916,42 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner) * CatalogCacheCreateEntry * Create a new CatCTup entry, copying the given HeapTuple and other * supplied data into it. The new entry initially has refcount 0. + * + * To create a normal cache entry, ntp must be the HeapTuple just fetched + * from scandesc, and "arguments" is not used. To create a negative cache + * entry, pass NULL for ntp and scandesc; then "arguments" is the cache + * keys to use. In either case, hashValue/hashIndex are the hash values + * computed from the cache keys. + * + * Returns NULL if we attempt to detoast the tuple and observe that it + * became stale. (This cannot happen for a negative entry.) Caller must + * retry the tuple lookup in that case. */ static CatCTup * -CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, - uint32 hashValue, Index hashIndex, - bool negative) +CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc, + Datum *arguments, + uint32 hashValue, Index hashIndex) { CatCTup *ct; HeapTuple dtp; MemoryContext oldcxt; - /* negative entries have no tuple associated */ if (ntp) { int i; - Assert(!negative); + /* + * The visibility recheck below essentially never fails during our + * regression tests, and there's no easy way to force it to fail for + * testing purposes. To ensure we have test coverage for the retry + * paths in our callers, make debug builds randomly fail about 0.1% of + * the times through this code path, even when there's no toasted + * fields. + */ +#ifdef USE_ASSERT_CHECKING + if (pg_prng_uint32(&pg_global_prng_state) <= (PG_UINT32_MAX / 1000)) + return NULL; +#endif /* * If there are any out-of-line toasted fields in the tuple, expand @@ -1890,7 +1961,20 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, * something using a slightly stale catcache entry. */ if (HeapTupleHasExternal(ntp)) + { dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc); + + /* + * The tuple could become stale while we are doing toast table + * access (since AcceptInvalidationMessages can run then), so we + * must recheck its visibility afterwards. + */ + if (!systable_recheck_tuple(scandesc, ntp)) + { + heap_freetuple(dtp); + return NULL; + } + } else dtp = ntp; @@ -1929,7 +2013,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, } else { - Assert(negative); + /* Set up keys for a negative cache entry */ oldcxt = MemoryContextSwitchTo(CacheMemoryContext); ct = (CatCTup *) palloc(sizeof(CatCTup)); @@ -1951,7 +2035,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, ct->c_list = NULL; ct->refcount = 0; /* for the moment */ ct->dead = false; - ct->negative = negative; + ct->negative = (ntp == NULL); ct->hash_value = hashValue; dlist_push_head(&cache->cc_bucket[hashIndex], &ct->cache_elem);
Great! That's what exactly we need.
The patch LGTM, +1
Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月13日周六 04:47写道:
I wrote:
> This is uncomfortably much in bed with the tuple table slot code,
> perhaps, but I don't see a way to do it more cleanly unless we want
> to add some new provisions to that API. Andres, do you have any
> thoughts about that?
Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.
regards, tom lane
Hmm, how about first checking if any invalidated shared messages have been accepted, then rechecking the tuple's visibility?
If there is no invalidated shared message accepted during 'toast_flatten_tuple', there is no need to do then visibility check, then it can save several
CPU cycles.
----
if (inval_count != SharedInvalidMessageCounter && !systable_recheck_tuple(scandesc, ntp))
{
heap_freetuple(dtp);
return NULL;
}
{
heap_freetuple(dtp);
return NULL;
}
----
Xiaoran Wang <fanfuxiaoran@gmail.com> 于2024年1月13日周六 13:16写道:
Great! That's what exactly we need.The patch LGTM, +1Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月13日周六 04:47写道:I wrote:
> This is uncomfortably much in bed with the tuple table slot code,
> perhaps, but I don't see a way to do it more cleanly unless we want
> to add some new provisions to that API. Andres, do you have any
> thoughts about that?
Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.
regards, tom lane
Xiaoran Wang <fanfuxiaoran@gmail.com> writes: > Hmm, how about first checking if any invalidated shared messages have been > accepted, then rechecking the tuple's visibility? > If there is no invalidated shared message accepted during > 'toast_flatten_tuple', > there is no need to do then visibility check, then it can save several > CPU cycles. Meh, I'd just as soon not add the additional dependency/risk of bugs. This is an expensive and seldom-taken code path, so I don't think shaving a few cycles is really important. regards, tom lane
I wrote: > Xiaoran Wang <fanfuxiaoran@gmail.com> writes: >> Hmm, how about first checking if any invalidated shared messages have been >> accepted, then rechecking the tuple's visibility? >> If there is no invalidated shared message accepted during >> 'toast_flatten_tuple', >> there is no need to do then visibility check, then it can save several >> CPU cycles. > Meh, I'd just as soon not add the additional dependency/risk of bugs. > This is an expensive and seldom-taken code path, so I don't think > shaving a few cycles is really important. It occurred to me that this idea might be more interesting if we could encapsulate it right into systable_recheck_tuple: something like having systable_beginscan capture the current SharedInvalidMessageCounter and save it in the SysScanDesc struct, then compare in systable_recheck_tuple to possibly short-circuit that work. This'd eliminate one of the main bug hazards in the idea, namely that you might capture SharedInvalidMessageCounter too late, after something's already happened. However, the whole idea only works for catalogs that have catcaches, and the other users of systable_recheck_tuple are interested in pg_depend which doesn't. So that put a damper on my enthusiasm for the idea. regards, tom lane
On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote: > I wrote: > > This is uncomfortably much in bed with the tuple table slot code, > > perhaps, but I don't see a way to do it more cleanly unless we want > > to add some new provisions to that API. Andres, do you have any > > thoughts about that? > > Oh! After nosing around a bit more I remembered systable_recheck_tuple, > which is meant for exactly this purpose. So v4 attached. systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a general proxy for invalidation messages. The commit for $SUBJECT (ad98fb1) doesn't create any new malfunctions, but I expect the systable_recheck_tuple() part will change again before the heap_inplace_update() story is over (https://postgr.es/m/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com).
This is an interesting idea.
Although some catalog tables are not in catcaches,
such as pg_depend, when scanning them, if there is any SharedInvalidationMessage, the CatalogSnapshot
will be invalidated and recreated ("RelationInvalidatesSnapshotsOnly" in syscache.c)
Maybe during the system_scan, it receives the SharedInvalidationMessages and returns the tuples which
are out of date. systable_recheck_tuple is used in dependency.c for such case.
Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月14日周日 03:12写道:
I wrote:
> Xiaoran Wang <fanfuxiaoran@gmail.com> writes:
>> Hmm, how about first checking if any invalidated shared messages have been
>> accepted, then rechecking the tuple's visibility?
>> If there is no invalidated shared message accepted during
>> 'toast_flatten_tuple',
>> there is no need to do then visibility check, then it can save several
>> CPU cycles.
> Meh, I'd just as soon not add the additional dependency/risk of bugs.
> This is an expensive and seldom-taken code path, so I don't think
> shaving a few cycles is really important.
It occurred to me that this idea might be more interesting if we
could encapsulate it right into systable_recheck_tuple: something
like having systable_beginscan capture the current
SharedInvalidMessageCounter and save it in the SysScanDesc struct,
then compare in systable_recheck_tuple to possibly short-circuit
that work. This'd eliminate one of the main bug hazards in the
idea, namely that you might capture SharedInvalidMessageCounter too
late, after something's already happened. However, the whole idea
only works for catalogs that have catcaches, and the other users of
systable_recheck_tuple are interested in pg_depend which doesn't.
So that put a damper on my enthusiasm for the idea.
regards, tom lane
On Sun, Jan 14, 2024 at 12:14:11PM -0800, Noah Misch wrote: > On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote: > > Oh! After nosing around a bit more I remembered systable_recheck_tuple, > > which is meant for exactly this purpose. So v4 attached. > > systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a > general proxy for invalidation messages. The commit for $SUBJECT (ad98fb1) > doesn't create any new malfunctions, but I expect the systable_recheck_tuple() > part will change again before the heap_inplace_update() story is over > (https://postgr.es/m/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com). Commit f9f47f0 (2024-06-27) addressed inplace updates here.
On 25/09/2024 00:20, Noah Misch wrote: > On Sun, Jan 14, 2024 at 12:14:11PM -0800, Noah Misch wrote: >> On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote: >>> Oh! After nosing around a bit more I remembered systable_recheck_tuple, >>> which is meant for exactly this purpose. So v4 attached. >> >> systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a >> general proxy for invalidation messages. The commit for $SUBJECT (ad98fb1) >> doesn't create any new malfunctions, but I expect the systable_recheck_tuple() >> part will change again before the heap_inplace_update() story is over >> (https://postgr.es/m/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com). > > Commit f9f47f0 (2024-06-27) addressed inplace updates here. I started to wonder if there's still an issue with catcache list entries. The code to build a CatCList looks like this: SearchCatCacheList() systable_beginscan() while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) { ct = CatalogCacheCreateEntry(ntp) if (ct == NULL) { /* 'ntp' was concurrently invalidated, start all over */ } } systable_endscan(); /* create CatCList entry */ CatalogCacheCreateEntry() can accept catcache invalidations when it opens the toast table, and it now has recheck logic to detect the case that the tuple it's processing (ntp) is invalidated. However, isn't it also possible that it accepts an invalidation message for a tuple that we had processed in an earlier iteration of the loop? Or that a new catalog tuple was inserted that should be part of the list we're building? -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > CatalogCacheCreateEntry() can accept catcache invalidations when it > opens the toast table, and it now has recheck logic to detect the case > that the tuple it's processing (ntp) is invalidated. However, isn't it > also possible that it accepts an invalidation message for a tuple that > we had processed in an earlier iteration of the loop? Or that a new > catalog tuple was inserted that should be part of the list we're building? The expectation is that the list will be built and returned to the caller, but it's already marked as stale so it will be rebuilt on next request. We could consider putting a loop around that, but (a) it might loop a lot of times, and (b) returning a stale list isn't much different from the situation where the list-invalidating event arrives a nanosecond after we finish rather than a nanosecond before. Ultimately it's the caller's responsibility that the returned list be consistent enough for its purposes. It might achieve that by first taking a lock on a related table, for example. regards, tom lane