Thread: Recovering from detoast-related catcache invalidations

Recovering from detoast-related catcache invalidations

From

Tom Lane

Date:

26 October 2023, 20:43:33

In bug #18163 [1], Alexander proved the misgivings I had in [2]
about catcache detoasting being possibly unsafe:

>> BTW, while nosing around I found what seems like a very nasty related
>> bug.  Suppose that a catalog tuple being loaded into syscache contains
>> some toasted fields.  CatalogCacheCreateEntry will flatten the tuple,
>> involving fetches from toast tables that will certainly cause
>> AcceptInvalidationMessages calls.  What if one of those should have
>> invalidated this tuple?  We will not notice, because it's not in
>> the hashtable yet.  When we do add it, we will mark it not-dead,
>> meaning that the stale entry looks fine and could persist for a long
>> while.

Attached is a POC patch for fixing this.  The idea is that if we get
an invalidation while trying to detoast a catalog tuple, we should
go around and re-read the tuple a second time to get an up-to-date
version (or, possibly, discover that it no longer exists).  In the
case of SearchCatCacheList, we have to drop and reload the entire
catcache list, but fortunately not a lot of new code is needed.

The detection of "get an invalidation" could be refined: what I did
here is to check for any advance of SharedInvalidMessageCounter,
which clearly will have a significant number of false positives.
However, the only way I see to make that a lot better is to
temporarily create a placeholder catcache entry (probably a negative
one) with the same keys, and then see if it got marked dead.
This seems a little expensive, plus I'm afraid that it'd be actively
wrong in the recursive-lookup cases that the existing comment in
SearchCatCacheMiss is talking about (that is, the catcache entry
might mislead any recursive lookup that happens).

Moreover, if we did do something like that then the new code
paths would be essentially untested.  As the patch stands, the
reload path seems to get taken 10 to 20 times during a
"make installcheck-parallel" run of the core regression tests
(out of about 150 times that catcache detoasting is required).
Probably all of those are false-positive cases, but at least
they're exercising the logic.

So I'm inclined to leave it like this, but perhaps somebody
else will have a different opinion.

(BTW, there's a fair amount of existing catcache.c code that
will need to be indented another tab stop, but in the interests
of keeping the patch legible I didn't do that yet.)

Comments?

            regards, tom lane

[1] https://www.postgresql.org/message-id/18163-859bad19a43edcf6%40postgresql.org
[2] https://www.postgresql.org/message-id/1389919.1697144487%40sss.pgh.pa.us

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 1aacb736c2..8255162a72 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1319,6 +1319,7 @@ SearchCatCacheMiss(CatCache *cache,
     SysScanDesc scandesc;
     HeapTuple    ntp;
     CatCTup    *ct;
+    bool        stale;
     Datum        arguments[CATCACHE_MAXKEYS];

     /* Initialize local parameter array */
@@ -1327,16 +1328,6 @@ SearchCatCacheMiss(CatCache *cache,
     arguments[2] = v3;
     arguments[3] = v4;

-    /*
-     * Ok, need to make a lookup in the relation, copy the scankey and fill
-     * out any per-call fields.
-     */
-    memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-    cur_skey[0].sk_argument = v1;
-    cur_skey[1].sk_argument = v2;
-    cur_skey[2].sk_argument = v3;
-    cur_skey[3].sk_argument = v4;
-
     /*
      * Tuple was not found in cache, so we have to try to retrieve it directly
      * from the relation.  If found, we will add it to the cache; if not
@@ -1351,9 +1342,28 @@ SearchCatCacheMiss(CatCache *cache,
      * will eventually age out of the cache, so there's no functional problem.
      * This case is rare enough that it's not worth expending extra cycles to
      * detect.
+     *
+     * Another case, which we *must* handle, is that the tuple could become
+     * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+     * AcceptInvalidationMessages can run during TOAST table access).  We do
+     * not want to return already-stale catcache entries, so we loop around
+     * and do the table scan again if that happens.
      */
     relation = table_open(cache->cc_reloid, AccessShareLock);

+    do
+    {
+        /*
+         * Ok, need to make a lookup in the relation, copy the scankey and
+         * fill out any per-call fields.  (We must re-do this when retrying,
+         * because systable_beginscan scribbles on the scankey.)
+         */
+        memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
+        cur_skey[0].sk_argument = v1;
+        cur_skey[1].sk_argument = v2;
+        cur_skey[2].sk_argument = v3;
+        cur_skey[3].sk_argument = v4;
+
     scandesc = systable_beginscan(relation,
                                   cache->cc_indexoid,
                                   IndexScanOK(cache, cur_skey),
@@ -1362,12 +1372,19 @@ SearchCatCacheMiss(CatCache *cache,
                                   cur_skey);

     ct = NULL;
+    stale = false;

     while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
     {
         ct = CatalogCacheCreateEntry(cache, ntp, arguments,
                                      hashValue, hashIndex,
                                      false);
+        /* upon failure, we must start the scan over */
+        if (ct == NULL)
+        {
+            stale = true;
+            break;
+        }
         /* immediately set the refcount to 1 */
         ResourceOwnerEnlargeCatCacheRefs(CurrentResourceOwner);
         ct->refcount++;
@@ -1376,6 +1393,7 @@ SearchCatCacheMiss(CatCache *cache,
     }

     systable_endscan(scandesc);
+    } while (stale);

     table_close(relation, AccessShareLock);

@@ -1398,6 +1416,9 @@ SearchCatCacheMiss(CatCache *cache,
                                      hashValue, hashIndex,
                                      true);

+        /* Creating a negative cache entry shouldn't fail */
+        Assert(ct != NULL);
+
         CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
                    cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
         CACHE_elog(DEBUG2, "SearchCatCache(%s): put neg entry in bucket %d",
@@ -1607,6 +1628,7 @@ SearchCatCacheList(CatCache *cache,
      */
     ResourceOwnerEnlargeCatCacheListRefs(CurrentResourceOwner);

+    /* ctlist must be valid throughout the PG_TRY block */
     ctlist = NIL;

     PG_TRY();
@@ -1614,19 +1636,23 @@ SearchCatCacheList(CatCache *cache,
         ScanKeyData cur_skey[CATCACHE_MAXKEYS];
         Relation    relation;
         SysScanDesc scandesc;
-
-        /*
-         * Ok, need to make a lookup in the relation, copy the scankey and
-         * fill out any per-call fields.
-         */
-        memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
-        cur_skey[0].sk_argument = v1;
-        cur_skey[1].sk_argument = v2;
-        cur_skey[2].sk_argument = v3;
-        cur_skey[3].sk_argument = v4;
+        bool        stale;

         relation = table_open(cache->cc_reloid, AccessShareLock);

+        do
+        {
+            /*
+             * Ok, need to make a lookup in the relation, copy the scankey and
+             * fill out any per-call fields.  (We must re-do this when
+             * retrying, because systable_beginscan scribbles on the scankey.)
+             */
+            memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
+            cur_skey[0].sk_argument = v1;
+            cur_skey[1].sk_argument = v2;
+            cur_skey[2].sk_argument = v3;
+            cur_skey[3].sk_argument = v4;
+
         scandesc = systable_beginscan(relation,
                                       cache->cc_indexoid,
                                       IndexScanOK(cache, cur_skey),
@@ -1637,6 +1663,8 @@ SearchCatCacheList(CatCache *cache,
         /* The list will be ordered iff we are doing an index scan */
         ordered = (scandesc->irel != NULL);

+        stale = false;
+
         while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
         {
             uint32        hashValue;
@@ -1682,6 +1710,30 @@ SearchCatCacheList(CatCache *cache,
                 ct = CatalogCacheCreateEntry(cache, ntp, arguments,
                                              hashValue, hashIndex,
                                              false);
+                /* upon failure, we must start the scan over */
+                if (ct == NULL)
+                {
+                    /*
+                     * Release refcounts on any items we already had.  We dare
+                     * not try to free them if they're now unreferenced, since
+                     * an error while doing that would result in the PG_CATCH
+                     * below doing extra refcount decrements.  Besides, we'll
+                     * likely re-adopt those items in the next iteration, so
+                     * it's not worth complicating matters to try to get rid
+                     * of them.
+                     */
+                    foreach(ctlist_item, ctlist)
+                    {
+                        ct = (CatCTup *) lfirst(ctlist_item);
+                        Assert(ct->c_list == NULL);
+                        Assert(ct->refcount > 0);
+                        ct->refcount--;
+                    }
+                    /* Reset ctlist in preparation for new try */
+                    ctlist = NIL;
+                    stale = true;
+                    break;
+                }
             }

             /* Careful here: add entry to ctlist, then bump its refcount */
@@ -1691,6 +1743,7 @@ SearchCatCacheList(CatCache *cache,
         }

         systable_endscan(scandesc);
+        } while (stale);

         table_close(relation, AccessShareLock);

@@ -1797,6 +1850,10 @@ ReleaseCatCacheList(CatCList *list)
  * CatalogCacheCreateEntry
  *        Create a new CatCTup entry, copying the given HeapTuple and other
  *        supplied data into it.  The new entry initially has refcount 0.
+ *
+ * Returns NULL if we attempt to detoast the tuple and observe that it
+ * became stale.  (This cannot happen for a negative entry.)  Caller must
+ * retry the tuple lookup in that case.
  */
 static CatCTup *
 CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
@@ -1820,9 +1877,24 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
          * entry, and also protects us against the possibility of the toast
          * tuples being freed before we attempt to fetch them, in case of
          * something using a slightly stale catcache entry.
+         *
+         * If we see SharedInvalidMessageCounter advance during detoasting,
+         * assume that the tuple we are interested in might have been
+         * invalidated, and report failure.  (Since we don't yet have a
+         * catcache entry representing the tuple, it's impossible to tell for
+         * sure, and we have to be conservative.)
          */
         if (HeapTupleHasExternal(ntp))
+        {
+            uint64        inval_count = SharedInvalidMessageCounter;
+
             dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
+            if (inval_count != SharedInvalidMessageCounter)
+            {
+                heap_freetuple(dtp);
+                return NULL;
+            }
+        }
         else
             dtp = ntp;

Re: Recovering from detoast-related catcache invalidations

From

Tom Lane

Date:

17 November 2023, 20:35:52

I wrote:
> In bug #18163 [1], Alexander proved the misgivings I had in [2]
> about catcache detoasting being possibly unsafe:
> ...
> Attached is a POC patch for fixing this.

The cfbot pointed out that this needed a rebase.  No substantive
changes.

            regards, tom lane

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 2e2e4d9f1f..2d72e8106c 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1372,6 +1372,7 @@ SearchCatCacheMiss(CatCache *cache,
     SysScanDesc scandesc;
     HeapTuple    ntp;
     CatCTup    *ct;
+    bool        stale;
     Datum        arguments[CATCACHE_MAXKEYS];

     /* Initialize local parameter array */
@@ -1380,16 +1381,6 @@ SearchCatCacheMiss(CatCache *cache,
     arguments[2] = v3;
     arguments[3] = v4;

-    /*
-     * Ok, need to make a lookup in the relation, copy the scankey and fill
-     * out any per-call fields.
-     */
-    memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-    cur_skey[0].sk_argument = v1;
-    cur_skey[1].sk_argument = v2;
-    cur_skey[2].sk_argument = v3;
-    cur_skey[3].sk_argument = v4;
-
     /*
      * Tuple was not found in cache, so we have to try to retrieve it directly
      * from the relation.  If found, we will add it to the cache; if not
@@ -1404,9 +1395,28 @@ SearchCatCacheMiss(CatCache *cache,
      * will eventually age out of the cache, so there's no functional problem.
      * This case is rare enough that it's not worth expending extra cycles to
      * detect.
+     *
+     * Another case, which we *must* handle, is that the tuple could become
+     * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+     * AcceptInvalidationMessages can run during TOAST table access).  We do
+     * not want to return already-stale catcache entries, so we loop around
+     * and do the table scan again if that happens.
      */
     relation = table_open(cache->cc_reloid, AccessShareLock);

+    do
+    {
+        /*
+         * Ok, need to make a lookup in the relation, copy the scankey and
+         * fill out any per-call fields.  (We must re-do this when retrying,
+         * because systable_beginscan scribbles on the scankey.)
+         */
+        memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
+        cur_skey[0].sk_argument = v1;
+        cur_skey[1].sk_argument = v2;
+        cur_skey[2].sk_argument = v3;
+        cur_skey[3].sk_argument = v4;
+
     scandesc = systable_beginscan(relation,
                                   cache->cc_indexoid,
                                   IndexScanOK(cache, cur_skey),
@@ -1415,12 +1425,19 @@ SearchCatCacheMiss(CatCache *cache,
                                   cur_skey);

     ct = NULL;
+    stale = false;

     while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
     {
         ct = CatalogCacheCreateEntry(cache, ntp, arguments,
                                      hashValue, hashIndex,
                                      false);
+        /* upon failure, we must start the scan over */
+        if (ct == NULL)
+        {
+            stale = true;
+            break;
+        }
         /* immediately set the refcount to 1 */
         ResourceOwnerEnlarge(CurrentResourceOwner);
         ct->refcount++;
@@ -1429,6 +1446,7 @@ SearchCatCacheMiss(CatCache *cache,
     }

     systable_endscan(scandesc);
+    } while (stale);

     table_close(relation, AccessShareLock);

@@ -1451,6 +1469,9 @@ SearchCatCacheMiss(CatCache *cache,
                                      hashValue, hashIndex,
                                      true);

+        /* Creating a negative cache entry shouldn't fail */
+        Assert(ct != NULL);
+
         CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
                    cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
         CACHE_elog(DEBUG2, "SearchCatCache(%s): put neg entry in bucket %d",
@@ -1663,7 +1684,8 @@ SearchCatCacheList(CatCache *cache,
      * We have to bump the member refcounts temporarily to ensure they won't
      * get dropped from the cache while loading other members. We use a PG_TRY
      * block to ensure we can undo those refcounts if we get an error before
-     * we finish constructing the CatCList.
+     * we finish constructing the CatCList.  ctlist must be valid throughout
+     * the PG_TRY block.
      */
     ctlist = NIL;

@@ -1672,19 +1694,23 @@ SearchCatCacheList(CatCache *cache,
         ScanKeyData cur_skey[CATCACHE_MAXKEYS];
         Relation    relation;
         SysScanDesc scandesc;
-
-        /*
-         * Ok, need to make a lookup in the relation, copy the scankey and
-         * fill out any per-call fields.
-         */
-        memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
-        cur_skey[0].sk_argument = v1;
-        cur_skey[1].sk_argument = v2;
-        cur_skey[2].sk_argument = v3;
-        cur_skey[3].sk_argument = v4;
+        bool        stale;

         relation = table_open(cache->cc_reloid, AccessShareLock);

+        do
+        {
+            /*
+             * Ok, need to make a lookup in the relation, copy the scankey and
+             * fill out any per-call fields.  (We must re-do this when
+             * retrying, because systable_beginscan scribbles on the scankey.)
+             */
+            memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
+            cur_skey[0].sk_argument = v1;
+            cur_skey[1].sk_argument = v2;
+            cur_skey[2].sk_argument = v3;
+            cur_skey[3].sk_argument = v4;
+
         scandesc = systable_beginscan(relation,
                                       cache->cc_indexoid,
                                       IndexScanOK(cache, cur_skey),
@@ -1695,6 +1721,8 @@ SearchCatCacheList(CatCache *cache,
         /* The list will be ordered iff we are doing an index scan */
         ordered = (scandesc->irel != NULL);

+        stale = false;
+
         while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
         {
             uint32        hashValue;
@@ -1740,6 +1768,30 @@ SearchCatCacheList(CatCache *cache,
                 ct = CatalogCacheCreateEntry(cache, ntp, arguments,
                                              hashValue, hashIndex,
                                              false);
+                /* upon failure, we must start the scan over */
+                if (ct == NULL)
+                {
+                    /*
+                     * Release refcounts on any items we already had.  We dare
+                     * not try to free them if they're now unreferenced, since
+                     * an error while doing that would result in the PG_CATCH
+                     * below doing extra refcount decrements.  Besides, we'll
+                     * likely re-adopt those items in the next iteration, so
+                     * it's not worth complicating matters to try to get rid
+                     * of them.
+                     */
+                    foreach(ctlist_item, ctlist)
+                    {
+                        ct = (CatCTup *) lfirst(ctlist_item);
+                        Assert(ct->c_list == NULL);
+                        Assert(ct->refcount > 0);
+                        ct->refcount--;
+                    }
+                    /* Reset ctlist in preparation for new try */
+                    ctlist = NIL;
+                    stale = true;
+                    break;
+                }
             }

             /* Careful here: add entry to ctlist, then bump its refcount */
@@ -1749,6 +1801,7 @@ SearchCatCacheList(CatCache *cache,
         }

         systable_endscan(scandesc);
+        } while (stale);

         table_close(relation, AccessShareLock);

@@ -1865,6 +1918,10 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner)
  * CatalogCacheCreateEntry
  *        Create a new CatCTup entry, copying the given HeapTuple and other
  *        supplied data into it.  The new entry initially has refcount 0.
+ *
+ * Returns NULL if we attempt to detoast the tuple and observe that it
+ * became stale.  (This cannot happen for a negative entry.)  Caller must
+ * retry the tuple lookup in that case.
  */
 static CatCTup *
 CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
@@ -1888,9 +1945,24 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
          * entry, and also protects us against the possibility of the toast
          * tuples being freed before we attempt to fetch them, in case of
          * something using a slightly stale catcache entry.
+         *
+         * If we see SharedInvalidMessageCounter advance during detoasting,
+         * assume that the tuple we are interested in might have been
+         * invalidated, and report failure.  (Since we don't yet have a
+         * catcache entry representing the tuple, it's impossible to tell for
+         * sure, and we have to be conservative.)
          */
         if (HeapTupleHasExternal(ntp))
+        {
+            uint64        inval_count = SharedInvalidMessageCounter;
+
             dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
+            if (inval_count != SharedInvalidMessageCounter)
+            {
+                heap_freetuple(dtp);
+                return NULL;
+            }
+        }
         else
             dtp = ntp;

Re: Recovering from detoast-related catcache invalidations

From

Xiaoran Wang

Date:

11 January 2024, 17:52:21

Hi,

>> BTW, while nosing around I found what seems like a very nasty related
>> bug. Suppose that a catalog tuple being loaded into syscache contains
>> some toasted fields. CatalogCacheCreateEntry will flatten the tuple,
>> involving fetches from toast tables that will certainly cause
>> AcceptInvalidationMessages calls. What if one of those should have
>> invalidated this tuple? We will not notice, because it's not in
>> the hashtable yet. When we do add it, we will mark it not-dead,
>> meaning that the stale entry looks fine and could persist for a long
>> while.

I spent some time trying to understand the bug and finally, I can reproduce it locally with the following steps:

step1:

create a function called 'test' with a long body that must be stored in a toast table.

and put it in schema 'yy' by : "alter function test set schema yy";

step 2:

I added a breakpoint at 'toast_flatten_tuple' for session1 ,

then execute the following SQL:

----------

set search_path='public';

alter function test set schema xx;

----------

step 3:

when the session1 stops at the breakpoint, I open session2 and execute

-----------

set search_path = 'yy';

alter function test set schema public;

-----------

step4:

resume the session1 , it reports the error "ERROR: could not find a function named "test""

step 5:

continue to execute "alter function test set schema xx;" in session1, but it still can not work and report the above error although the function test already belongs to schema 'public'

Obviously, in session 1, the "test" proc tuple in the cache is outdated.

>> The detection of "get an invalidation" could be refined: what I did
>> here is to check for any advance of SharedInvalidMessageCounter,
>> which clearly will have a significant number of false positives.
>> However, the only way I see to make that a lot better is to
>> temporarily create a placeholder catcache entry (probably a negative
>> one) with the same keys, and then see if it got marked dead.
>> This seems a little expensive, plus I'm afraid that it'd be actively
>> wrong in the recursive-lookup cases that the existing comment in
>> SearchCatCacheMiss is talking about (that is, the catcache entry
>> might mislead any recursive lookup that happens).

I have reviewed your patch, and it looks good. But instead of checking for any advance of SharedInvalidMessageCounter ( if the invalidate message is not related to the current tuple, it is a little expensive) I have another idea: we can recheck the visibility of the tuple with CatalogSnapshot(the CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if it is not visible, we re-fetch the tuple, otherwise, we can continue to use it as it is not outdated.

I added a commit based on your patch and attached it.

Attachment

0001-Recheck-the-tuple-visibility.patch

Re: Recovering from detoast-related catcache invalidations

From

Tom Lane

Date:

11 January 2024, 22:21:03

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:
>>> The detection of "get an invalidation" could be refined: what I did
>>> here is to check for any advance of SharedInvalidMessageCounter,
>>> which clearly will have a significant number of false positives.

> I have reviewed your patch, and it looks good.  But instead of checking for
> any advance of SharedInvalidMessageCounter ( if the invalidate message is
> not related to the current tuple, it is a little expensive)  I have another
> idea:  we can recheck the visibility of the tuple with CatalogSnapshot(the
> CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if
> it is not visible, we re-fetch the tuple, otherwise, we can continue to use
> it as it is not outdated.

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable.  Isn't there a cleaner way to make this check?

Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
than GetCatalogSnapshot is the right thing, because the catcaches
use the latter.

            regards, tom lane

Re: Recovering from detoast-related catcache invalidations

From

Xiaoran Wang

Date:

12 January 2024, 03:56:32

> Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
> than GetCatalogSnapshot is the right thing, because the catcaches
> use the latter.

Yes, you are right, should use GetCatalogSnapshot here.

> Maybe, but that undocumented hack in SetHintBits seems completely
> unacceptable. Isn't there a cleaner way to make this check?

Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the tuple has been deleted.

As the tuple's xmin must been committed, so we just need to check if its xmax is committed,

like the below:

------------

@@ -1956,9 +1956,11 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
*/
if (HeapTupleHasExternal(ntp))
{
+ TransactionId xmax;

dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
- if (!HeapTupleSatisfiesVisibility(ntp, GetNonHistoricCatalogSnapshot(cache->cc_reloid), InvalidBuffer))
+ xmax = HeapTupleHeaderGetUpdateXid(ntp->t_data);
+ if (TransactionIdIsValid(xmax) && TransactionIdDidCommit(xmax))
{
heap_freetuple(dtp);
return NULL;

------------

I'm not quite sure the code is correct, I cannot clearly understand 'HeapTupleHeaderGetUpdateXid', and I need more time to dive into it.

Any thoughts?

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月12日周五 06:21写道：

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:
>>> The detection of "get an invalidation" could be refined: what I did
>>> here is to check for any advance of SharedInvalidMessageCounter,
>>> which clearly will have a significant number of false positives.

> I have reviewed your patch, and it looks good. But instead of checking for
> any advance of SharedInvalidMessageCounter ( if the invalidate message is
> not related to the current tuple, it is a little expensive) I have another
> idea: we can recheck the visibility of the tuple with CatalogSnapshot(the
> CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if
> it is not visible, we re-fetch the tuple, otherwise, we can continue to use
> it as it is not outdated.

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable. Isn't there a cleaner way to make this check?

Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
than GetCatalogSnapshot is the right thing, because the catcaches
use the latter.

regards, tom lane

Re: Recovering from detoast-related catcache invalidations

From

Tom Lane

Date:

12 January 2024, 20:14:12

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:
>> Maybe, but that undocumented hack in SetHintBits seems completely
>> unacceptable.  Isn't there a cleaner way to make this check?

> Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the
> tuple has been deleted.
> As the tuple's xmin must been committed, so we just need to check if its
> xmax is committed,

I'm not super thrilled with that.  Something I realized last night is
that your proposal only works if "ntp" is pointing directly into the
catalog's disk buffers.  If something earlier than this code had made
a local-memory copy of the catalog tuple, then it's possible that its
header fields (particularly xmax) are out of date compared to shared
buffers and would fail to tell us that some other process just
invalidated the tuple.  Now in fact, with the current implementation
of syscache_getnext() the result is actually a live tuple and so we
can expect to see any relevant updates.  But I think we'd better add
some Asserts that that's so; and that also provides us with a way to
call HeapTupleSatisfiesVisibility fully legally, because we can get
the buffer reference out of the scan descriptor too.

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API.  Andres, do you have any
thoughts about that?

Anyway, this approach gets rid of false positives, which is great
for performance and bad for testing.  Code coverage says that now
we never hit the failure paths during regression tests, which is
unsurprising, but I'm not very comfortable with leaving those paths
unexercised.  I tried to make an isolation test to exercise them,
but there's no good way at the SQL level to get a session to block
during the detoast step.  LOCK TABLE on some catalog's toast table
would do, but we disallow it.  I thought about adding a small C
function to regress.so to take out such a lock, but we have no
infrastructure for referencing regress.so from isolation tests.
What I ended up doing is adding a random failure about 0.1% of
the time in USE_ASSERT_CHECKING builds --- that's intellectually
ugly for sure, but doing better seems like way more work than
it's worth.

            regards, tom lane

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index c0591cffe5..b1ce7bf513 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -15,6 +15,7 @@
 #include "postgres.h"

 #include "access/genam.h"
+#include "access/heapam.h"
 #include "access/heaptoast.h"
 #include "access/relscan.h"
 #include "access/sysattr.h"
@@ -24,8 +25,10 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
+#include "common/pg_prng.h"
 #include "miscadmin.h"
 #include "port/pg_bitutils.h"
+#include "storage/bufmgr.h"
 #ifdef CATCACHE_STATS
 #include "storage/ipc.h"        /* for on_proc_exit */
 #endif
@@ -90,10 +93,10 @@ static void CatCachePrintStats(int code, Datum arg);
 static void CatCacheRemoveCTup(CatCache *cache, CatCTup *ct);
 static void CatCacheRemoveCList(CatCache *cache, CatCList *cl);
 static void CatalogCacheInitializeCache(CatCache *cache);
-static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp,
+static CatCTup *CatalogCacheCreateEntry(CatCache *cache,
+                                        HeapTuple ntp, SysScanDesc scandesc,
                                         Datum *arguments,
-                                        uint32 hashValue, Index hashIndex,
-                                        bool negative);
+                                        uint32 hashValue, Index hashIndex);

 static void ReleaseCatCacheWithOwner(HeapTuple tuple, ResourceOwner resowner);
 static void ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner);
@@ -1372,6 +1375,7 @@ SearchCatCacheMiss(CatCache *cache,
     SysScanDesc scandesc;
     HeapTuple    ntp;
     CatCTup    *ct;
+    bool        stale;
     Datum        arguments[CATCACHE_MAXKEYS];

     /* Initialize local parameter array */
@@ -1380,16 +1384,6 @@ SearchCatCacheMiss(CatCache *cache,
     arguments[2] = v3;
     arguments[3] = v4;

-    /*
-     * Ok, need to make a lookup in the relation, copy the scankey and fill
-     * out any per-call fields.
-     */
-    memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-    cur_skey[0].sk_argument = v1;
-    cur_skey[1].sk_argument = v2;
-    cur_skey[2].sk_argument = v3;
-    cur_skey[3].sk_argument = v4;
-
     /*
      * Tuple was not found in cache, so we have to try to retrieve it directly
      * from the relation.  If found, we will add it to the cache; if not
@@ -1404,9 +1398,28 @@ SearchCatCacheMiss(CatCache *cache,
      * will eventually age out of the cache, so there's no functional problem.
      * This case is rare enough that it's not worth expending extra cycles to
      * detect.
+     *
+     * Another case, which we *must* handle, is that the tuple could become
+     * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+     * AcceptInvalidationMessages can run during TOAST table access).  We do
+     * not want to return already-stale catcache entries, so we loop around
+     * and do the table scan again if that happens.
      */
     relation = table_open(cache->cc_reloid, AccessShareLock);

+    do
+    {
+        /*
+         * Ok, need to make a lookup in the relation, copy the scankey and
+         * fill out any per-call fields.  (We must re-do this when retrying,
+         * because systable_beginscan scribbles on the scankey.)
+         */
+        memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
+        cur_skey[0].sk_argument = v1;
+        cur_skey[1].sk_argument = v2;
+        cur_skey[2].sk_argument = v3;
+        cur_skey[3].sk_argument = v4;
+
     scandesc = systable_beginscan(relation,
                                   cache->cc_indexoid,
                                   IndexScanOK(cache, cur_skey),
@@ -1415,12 +1428,18 @@ SearchCatCacheMiss(CatCache *cache,
                                   cur_skey);

     ct = NULL;
+    stale = false;

     while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
     {
-        ct = CatalogCacheCreateEntry(cache, ntp, arguments,
-                                     hashValue, hashIndex,
-                                     false);
+        ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+                                     hashValue, hashIndex);
+        /* upon failure, we must start the scan over */
+        if (ct == NULL)
+        {
+            stale = true;
+            break;
+        }
         /* immediately set the refcount to 1 */
         ResourceOwnerEnlarge(CurrentResourceOwner);
         ct->refcount++;
@@ -1429,6 +1448,7 @@ SearchCatCacheMiss(CatCache *cache,
     }

     systable_endscan(scandesc);
+    } while (stale);

     table_close(relation, AccessShareLock);

@@ -1447,9 +1467,11 @@ SearchCatCacheMiss(CatCache *cache,
         if (IsBootstrapProcessingMode())
             return NULL;

-        ct = CatalogCacheCreateEntry(cache, NULL, arguments,
-                                     hashValue, hashIndex,
-                                     true);
+        ct = CatalogCacheCreateEntry(cache, NULL, NULL, arguments,
+                                     hashValue, hashIndex);
+
+        /* Creating a negative cache entry shouldn't fail */
+        Assert(ct != NULL);

         CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
                    cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
@@ -1663,7 +1685,8 @@ SearchCatCacheList(CatCache *cache,
      * We have to bump the member refcounts temporarily to ensure they won't
      * get dropped from the cache while loading other members. We use a PG_TRY
      * block to ensure we can undo those refcounts if we get an error before
-     * we finish constructing the CatCList.
+     * we finish constructing the CatCList.  ctlist must be valid throughout
+     * the PG_TRY block.
      */
     ctlist = NIL;

@@ -1672,19 +1695,23 @@ SearchCatCacheList(CatCache *cache,
         ScanKeyData cur_skey[CATCACHE_MAXKEYS];
         Relation    relation;
         SysScanDesc scandesc;
-
-        /*
-         * Ok, need to make a lookup in the relation, copy the scankey and
-         * fill out any per-call fields.
-         */
-        memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
-        cur_skey[0].sk_argument = v1;
-        cur_skey[1].sk_argument = v2;
-        cur_skey[2].sk_argument = v3;
-        cur_skey[3].sk_argument = v4;
+        bool        stale;

         relation = table_open(cache->cc_reloid, AccessShareLock);

+        do
+        {
+            /*
+             * Ok, need to make a lookup in the relation, copy the scankey and
+             * fill out any per-call fields.  (We must re-do this when
+             * retrying, because systable_beginscan scribbles on the scankey.)
+             */
+            memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
+            cur_skey[0].sk_argument = v1;
+            cur_skey[1].sk_argument = v2;
+            cur_skey[2].sk_argument = v3;
+            cur_skey[3].sk_argument = v4;
+
         scandesc = systable_beginscan(relation,
                                       cache->cc_indexoid,
                                       IndexScanOK(cache, cur_skey),
@@ -1695,6 +1722,8 @@ SearchCatCacheList(CatCache *cache,
         /* The list will be ordered iff we are doing an index scan */
         ordered = (scandesc->irel != NULL);

+        stale = false;
+
         while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
         {
             uint32        hashValue;
@@ -1737,9 +1766,32 @@ SearchCatCacheList(CatCache *cache,
             if (!found)
             {
                 /* We didn't find a usable entry, so make a new one */
-                ct = CatalogCacheCreateEntry(cache, ntp, arguments,
-                                             hashValue, hashIndex,
-                                             false);
+                ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+                                             hashValue, hashIndex);
+                /* upon failure, we must start the scan over */
+                if (ct == NULL)
+                {
+                    /*
+                     * Release refcounts on any items we already had.  We dare
+                     * not try to free them if they're now unreferenced, since
+                     * an error while doing that would result in the PG_CATCH
+                     * below doing extra refcount decrements.  Besides, we'll
+                     * likely re-adopt those items in the next iteration, so
+                     * it's not worth complicating matters to try to get rid
+                     * of them.
+                     */
+                    foreach(ctlist_item, ctlist)
+                    {
+                        ct = (CatCTup *) lfirst(ctlist_item);
+                        Assert(ct->c_list == NULL);
+                        Assert(ct->refcount > 0);
+                        ct->refcount--;
+                    }
+                    /* Reset ctlist in preparation for new try */
+                    ctlist = NIL;
+                    stale = true;
+                    break;
+                }
             }

             /* Careful here: add entry to ctlist, then bump its refcount */
@@ -1749,6 +1801,7 @@ SearchCatCacheList(CatCache *cache,
         }

         systable_endscan(scandesc);
+        } while (stale);

         table_close(relation, AccessShareLock);

@@ -1865,22 +1918,41 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner)
  * CatalogCacheCreateEntry
  *        Create a new CatCTup entry, copying the given HeapTuple and other
  *        supplied data into it.  The new entry initially has refcount 0.
+ *
+ * To create a normal cache entry, ntp must be the HeapTuple just fetched
+ * from scandesc, and "arguments" is not used.  To create a negative cache
+ * entry, pass NULL for ntp and scandesc; then "arguments" is the cache
+ * keys to use.  In either case, hashValue/hashIndex are the hash values
+ * computed from the cache keys.
+ *
+ * Returns NULL if we attempt to detoast the tuple and observe that it
+ * became stale.  (This cannot happen for a negative entry.)  Caller must
+ * retry the tuple lookup in that case.
  */
 static CatCTup *
-CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
-                        uint32 hashValue, Index hashIndex,
-                        bool negative)
+CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
+                        Datum *arguments,
+                        uint32 hashValue, Index hashIndex)
 {
     CatCTup    *ct;
     HeapTuple    dtp;
     MemoryContext oldcxt;

-    /* negative entries have no tuple associated */
     if (ntp)
     {
         int            i;

-        Assert(!negative);
+        /*
+         * The visibility check below essentially never fails during our
+         * regression tests, and there's no easy way to force it to fail for
+         * testing purposes.  To ensure we have test coverage for the retry
+         * paths in our callers, make debug builds randomly fail about 0.1% of
+         * the times through this code path.
+         */
+#ifdef USE_ASSERT_CHECKING
+        if (pg_prng_uint32(&pg_global_prng_state) <= (PG_UINT32_MAX / 1000))
+            return NULL;
+#endif

         /*
          * If there are any out-of-line toasted fields in the tuple, expand
@@ -1888,9 +1960,32 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
          * entry, and also protects us against the possibility of the toast
          * tuples being freed before we attempt to fetch them, in case of
          * something using a slightly stale catcache entry.
+         *
+         * The tuple could become stale while we are doing toast table access
+         * (since AcceptInvalidationMessages can run then), so we must recheck
+         * its visibility afterwards.  To do that, we must be undesirably
+         * familiar with the fact that systable_getnext() will return a direct
+         * pointer into a catalog disk buffer.  This code is likely to need
+         * work if we ever want to support non-heap catalogs.
          */
         if (HeapTupleHasExternal(ntp))
+        {
+            BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) scandesc->slot;
+
+            /* Assert that we have a live pointer to a pinned catalog buffer */
+            Assert(TTS_IS_BUFFERTUPLE(scandesc->slot));
+            Assert(ntp == bslot->base.tuple);
+            Assert(BufferIsValid(bslot->buffer));
+
             dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
+            if (!HeapTupleSatisfiesVisibility(ntp,
+                                              GetCatalogSnapshot(cache->cc_reloid),
+                                              bslot->buffer))
+            {
+                heap_freetuple(dtp);
+                return NULL;
+            }
+        }
         else
             dtp = ntp;

@@ -1929,7 +2024,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
     }
     else
     {
-        Assert(negative);
+        /* Set up keys for a negative cache entry */
         oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
         ct = (CatCTup *) palloc(sizeof(CatCTup));

@@ -1951,7 +2046,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
     ct->c_list = NULL;
     ct->refcount = 0;            /* for the moment */
     ct->dead = false;
-    ct->negative = negative;
+    ct->negative = (ntp == NULL);
     ct->hash_value = hashValue;

     dlist_push_head(&cache->cc_bucket[hashIndex], &ct->cache_elem);

Re: Recovering from detoast-related catcache invalidations

From

Tom Lane

Date:

12 January 2024, 20:47:13

I wrote:
> This is uncomfortably much in bed with the tuple table slot code,
> perhaps, but I don't see a way to do it more cleanly unless we want
> to add some new provisions to that API.  Andres, do you have any
> thoughts about that?

Oh!  After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose.  So v4 attached.

            regards, tom lane

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index c0591cffe5..0dcd45d2f0 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
+#include "common/pg_prng.h"
 #include "miscadmin.h"
 #include "port/pg_bitutils.h"
 #ifdef CATCACHE_STATS
@@ -90,10 +91,10 @@ static void CatCachePrintStats(int code, Datum arg);
 static void CatCacheRemoveCTup(CatCache *cache, CatCTup *ct);
 static void CatCacheRemoveCList(CatCache *cache, CatCList *cl);
 static void CatalogCacheInitializeCache(CatCache *cache);
-static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp,
+static CatCTup *CatalogCacheCreateEntry(CatCache *cache,
+                                        HeapTuple ntp, SysScanDesc scandesc,
                                         Datum *arguments,
-                                        uint32 hashValue, Index hashIndex,
-                                        bool negative);
+                                        uint32 hashValue, Index hashIndex);

 static void ReleaseCatCacheWithOwner(HeapTuple tuple, ResourceOwner resowner);
 static void ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner);
@@ -1372,6 +1373,7 @@ SearchCatCacheMiss(CatCache *cache,
     SysScanDesc scandesc;
     HeapTuple    ntp;
     CatCTup    *ct;
+    bool        stale;
     Datum        arguments[CATCACHE_MAXKEYS];

     /* Initialize local parameter array */
@@ -1380,16 +1382,6 @@ SearchCatCacheMiss(CatCache *cache,
     arguments[2] = v3;
     arguments[3] = v4;

-    /*
-     * Ok, need to make a lookup in the relation, copy the scankey and fill
-     * out any per-call fields.
-     */
-    memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-    cur_skey[0].sk_argument = v1;
-    cur_skey[1].sk_argument = v2;
-    cur_skey[2].sk_argument = v3;
-    cur_skey[3].sk_argument = v4;
-
     /*
      * Tuple was not found in cache, so we have to try to retrieve it directly
      * from the relation.  If found, we will add it to the cache; if not
@@ -1404,9 +1396,28 @@ SearchCatCacheMiss(CatCache *cache,
      * will eventually age out of the cache, so there's no functional problem.
      * This case is rare enough that it's not worth expending extra cycles to
      * detect.
+     *
+     * Another case, which we *must* handle, is that the tuple could become
+     * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+     * AcceptInvalidationMessages can run during TOAST table access).  We do
+     * not want to return already-stale catcache entries, so we loop around
+     * and do the table scan again if that happens.
      */
     relation = table_open(cache->cc_reloid, AccessShareLock);

+    do
+    {
+        /*
+         * Ok, need to make a lookup in the relation, copy the scankey and
+         * fill out any per-call fields.  (We must re-do this when retrying,
+         * because systable_beginscan scribbles on the scankey.)
+         */
+        memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
+        cur_skey[0].sk_argument = v1;
+        cur_skey[1].sk_argument = v2;
+        cur_skey[2].sk_argument = v3;
+        cur_skey[3].sk_argument = v4;
+
     scandesc = systable_beginscan(relation,
                                   cache->cc_indexoid,
                                   IndexScanOK(cache, cur_skey),
@@ -1415,12 +1426,18 @@ SearchCatCacheMiss(CatCache *cache,
                                   cur_skey);

     ct = NULL;
+    stale = false;

     while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
     {
-        ct = CatalogCacheCreateEntry(cache, ntp, arguments,
-                                     hashValue, hashIndex,
-                                     false);
+        ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+                                     hashValue, hashIndex);
+        /* upon failure, we must start the scan over */
+        if (ct == NULL)
+        {
+            stale = true;
+            break;
+        }
         /* immediately set the refcount to 1 */
         ResourceOwnerEnlarge(CurrentResourceOwner);
         ct->refcount++;
@@ -1429,6 +1446,7 @@ SearchCatCacheMiss(CatCache *cache,
     }

     systable_endscan(scandesc);
+    } while (stale);

     table_close(relation, AccessShareLock);

@@ -1447,9 +1465,11 @@ SearchCatCacheMiss(CatCache *cache,
         if (IsBootstrapProcessingMode())
             return NULL;

-        ct = CatalogCacheCreateEntry(cache, NULL, arguments,
-                                     hashValue, hashIndex,
-                                     true);
+        ct = CatalogCacheCreateEntry(cache, NULL, NULL, arguments,
+                                     hashValue, hashIndex);
+
+        /* Creating a negative cache entry shouldn't fail */
+        Assert(ct != NULL);

         CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
                    cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
@@ -1663,7 +1683,8 @@ SearchCatCacheList(CatCache *cache,
      * We have to bump the member refcounts temporarily to ensure they won't
      * get dropped from the cache while loading other members. We use a PG_TRY
      * block to ensure we can undo those refcounts if we get an error before
-     * we finish constructing the CatCList.
+     * we finish constructing the CatCList.  ctlist must be valid throughout
+     * the PG_TRY block.
      */
     ctlist = NIL;

@@ -1672,19 +1693,23 @@ SearchCatCacheList(CatCache *cache,
         ScanKeyData cur_skey[CATCACHE_MAXKEYS];
         Relation    relation;
         SysScanDesc scandesc;
-
-        /*
-         * Ok, need to make a lookup in the relation, copy the scankey and
-         * fill out any per-call fields.
-         */
-        memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
-        cur_skey[0].sk_argument = v1;
-        cur_skey[1].sk_argument = v2;
-        cur_skey[2].sk_argument = v3;
-        cur_skey[3].sk_argument = v4;
+        bool        stale;

         relation = table_open(cache->cc_reloid, AccessShareLock);

+        do
+        {
+            /*
+             * Ok, need to make a lookup in the relation, copy the scankey and
+             * fill out any per-call fields.  (We must re-do this when
+             * retrying, because systable_beginscan scribbles on the scankey.)
+             */
+            memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
+            cur_skey[0].sk_argument = v1;
+            cur_skey[1].sk_argument = v2;
+            cur_skey[2].sk_argument = v3;
+            cur_skey[3].sk_argument = v4;
+
         scandesc = systable_beginscan(relation,
                                       cache->cc_indexoid,
                                       IndexScanOK(cache, cur_skey),
@@ -1695,6 +1720,8 @@ SearchCatCacheList(CatCache *cache,
         /* The list will be ordered iff we are doing an index scan */
         ordered = (scandesc->irel != NULL);

+        stale = false;
+
         while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
         {
             uint32        hashValue;
@@ -1737,9 +1764,32 @@ SearchCatCacheList(CatCache *cache,
             if (!found)
             {
                 /* We didn't find a usable entry, so make a new one */
-                ct = CatalogCacheCreateEntry(cache, ntp, arguments,
-                                             hashValue, hashIndex,
-                                             false);
+                ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+                                             hashValue, hashIndex);
+                /* upon failure, we must start the scan over */
+                if (ct == NULL)
+                {
+                    /*
+                     * Release refcounts on any items we already had.  We dare
+                     * not try to free them if they're now unreferenced, since
+                     * an error while doing that would result in the PG_CATCH
+                     * below doing extra refcount decrements.  Besides, we'll
+                     * likely re-adopt those items in the next iteration, so
+                     * it's not worth complicating matters to try to get rid
+                     * of them.
+                     */
+                    foreach(ctlist_item, ctlist)
+                    {
+                        ct = (CatCTup *) lfirst(ctlist_item);
+                        Assert(ct->c_list == NULL);
+                        Assert(ct->refcount > 0);
+                        ct->refcount--;
+                    }
+                    /* Reset ctlist in preparation for new try */
+                    ctlist = NIL;
+                    stale = true;
+                    break;
+                }
             }

             /* Careful here: add entry to ctlist, then bump its refcount */
@@ -1749,6 +1799,7 @@ SearchCatCacheList(CatCache *cache,
         }

         systable_endscan(scandesc);
+        } while (stale);

         table_close(relation, AccessShareLock);

@@ -1865,22 +1916,42 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner)
  * CatalogCacheCreateEntry
  *        Create a new CatCTup entry, copying the given HeapTuple and other
  *        supplied data into it.  The new entry initially has refcount 0.
+ *
+ * To create a normal cache entry, ntp must be the HeapTuple just fetched
+ * from scandesc, and "arguments" is not used.  To create a negative cache
+ * entry, pass NULL for ntp and scandesc; then "arguments" is the cache
+ * keys to use.  In either case, hashValue/hashIndex are the hash values
+ * computed from the cache keys.
+ *
+ * Returns NULL if we attempt to detoast the tuple and observe that it
+ * became stale.  (This cannot happen for a negative entry.)  Caller must
+ * retry the tuple lookup in that case.
  */
 static CatCTup *
-CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
-                        uint32 hashValue, Index hashIndex,
-                        bool negative)
+CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
+                        Datum *arguments,
+                        uint32 hashValue, Index hashIndex)
 {
     CatCTup    *ct;
     HeapTuple    dtp;
     MemoryContext oldcxt;

-    /* negative entries have no tuple associated */
     if (ntp)
     {
         int            i;

-        Assert(!negative);
+        /*
+         * The visibility recheck below essentially never fails during our
+         * regression tests, and there's no easy way to force it to fail for
+         * testing purposes.  To ensure we have test coverage for the retry
+         * paths in our callers, make debug builds randomly fail about 0.1% of
+         * the times through this code path, even when there's no toasted
+         * fields.
+         */
+#ifdef USE_ASSERT_CHECKING
+        if (pg_prng_uint32(&pg_global_prng_state) <= (PG_UINT32_MAX / 1000))
+            return NULL;
+#endif

         /*
          * If there are any out-of-line toasted fields in the tuple, expand
@@ -1890,7 +1961,20 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
          * something using a slightly stale catcache entry.
          */
         if (HeapTupleHasExternal(ntp))
+        {
             dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
+
+            /*
+             * The tuple could become stale while we are doing toast table
+             * access (since AcceptInvalidationMessages can run then), so we
+             * must recheck its visibility afterwards.
+             */
+            if (!systable_recheck_tuple(scandesc, ntp))
+            {
+                heap_freetuple(dtp);
+                return NULL;
+            }
+        }
         else
             dtp = ntp;

@@ -1929,7 +2013,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
     }
     else
     {
-        Assert(negative);
+        /* Set up keys for a negative cache entry */
         oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
         ct = (CatCTup *) palloc(sizeof(CatCTup));

@@ -1951,7 +2035,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
     ct->c_list = NULL;
     ct->refcount = 0;            /* for the moment */
     ct->dead = false;
-    ct->negative = negative;
+    ct->negative = (ntp == NULL);
     ct->hash_value = hashValue;

     dlist_push_head(&cache->cc_bucket[hashIndex], &ct->cache_elem);

Re: Recovering from detoast-related catcache invalidations

From

Xiaoran Wang

Date:

13 January 2024, 05:16:52

Great! That's what exactly we need.

The patch LGTM, +1

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月13日周六 04:47写道：

I wrote:
> This is uncomfortably much in bed with the tuple table slot code,
> perhaps, but I don't see a way to do it more cleanly unless we want
> to add some new provisions to that API. Andres, do you have any
> thoughts about that?

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

regards, tom lane

Re: Recovering from detoast-related catcache invalidations

From

Xiaoran Wang

Date:

13 January 2024, 09:02:15

Hmm, how about first checking if any invalidated shared messages have been accepted, then rechecking the tuple's visibility?

If there is no invalidated shared message accepted during 'toast_flatten_tuple', there is no need to do then visibility check, then it can save several

CPU cycles.

----

if (inval_count != SharedInvalidMessageCounter && !systable_recheck_tuple(scandesc, ntp))
{
heap_freetuple(dtp);
return NULL;
}

----

Xiaoran Wang <fanfuxiaoran@gmail.com> 于2024年1月13日周六 13:16写道：

Great! That's what exactly we need.

The patch LGTM, +1

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月13日周六 04:47写道：
I wrote:
> This is uncomfortably much in bed with the tuple table slot code,
> perhaps, but I don't see a way to do it more cleanly unless we want
> to add some new provisions to that API. Andres, do you have any
> thoughts about that?

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

regards, tom lane

Re: Recovering from detoast-related catcache invalidations

From

Tom Lane

Date:

13 January 2024, 17:18:32

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:
> Hmm, how about first checking if any invalidated shared messages have been
> accepted, then rechecking the tuple's visibility?
> If there is no invalidated shared message accepted during
> 'toast_flatten_tuple',
> there is no need to do then visibility check, then it can save several
> CPU cycles.

Meh, I'd just as soon not add the additional dependency/risk of bugs.
This is an expensive and seldom-taken code path, so I don't think
shaving a few cycles is really important.

            regards, tom lane

Re: Recovering from detoast-related catcache invalidations

From

Tom Lane

Date:

13 January 2024, 19:12:23

I wrote:
> Xiaoran Wang <fanfuxiaoran@gmail.com> writes:
>> Hmm, how about first checking if any invalidated shared messages have been
>> accepted, then rechecking the tuple's visibility?
>> If there is no invalidated shared message accepted during
>> 'toast_flatten_tuple',
>> there is no need to do then visibility check, then it can save several
>> CPU cycles.

> Meh, I'd just as soon not add the additional dependency/risk of bugs.
> This is an expensive and seldom-taken code path, so I don't think
> shaving a few cycles is really important.

It occurred to me that this idea might be more interesting if we
could encapsulate it right into systable_recheck_tuple: something
like having systable_beginscan capture the current
SharedInvalidMessageCounter and save it in the SysScanDesc struct,
then compare in systable_recheck_tuple to possibly short-circuit
that work.  This'd eliminate one of the main bug hazards in the
idea, namely that you might capture SharedInvalidMessageCounter too
late, after something's already happened.  However, the whole idea
only works for catalogs that have catcaches, and the other users of
systable_recheck_tuple are interested in pg_depend which doesn't.
So that put a damper on my enthusiasm for the idea.

            regards, tom lane

Re: Recovering from detoast-related catcache invalidations

From

Noah Misch

Date:

14 January 2024, 20:14:11

On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:
> I wrote:
> > This is uncomfortably much in bed with the tuple table slot code,
> > perhaps, but I don't see a way to do it more cleanly unless we want
> > to add some new provisions to that API.  Andres, do you have any
> > thoughts about that?
> 
> Oh!  After nosing around a bit more I remembered systable_recheck_tuple,
> which is meant for exactly this purpose.  So v4 attached.

systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
general proxy for invalidation messages.  The commit for $SUBJECT (ad98fb1)
doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
part will change again before the heap_inplace_update() story is over
(https://postgr.es/m/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com).

Re: Recovering from detoast-related catcache invalidations

From

Xiaoran Wang

Date:

15 January 2024, 03:28:23

This is an interesting idea.

Although some catalog tables are not in catcaches,

such as pg_depend, when scanning them, if there is any SharedInvalidationMessage, the CatalogSnapshot

will be invalidated and recreated ("RelationInvalidatesSnapshotsOnly" in syscache.c)

Maybe during the system_scan, it receives the SharedInvalidationMessages and returns the tuples which

are out of date. systable_recheck_tuple is used in dependency.c for such case.

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月14日周日 03:12写道：

I wrote:
> Xiaoran Wang <fanfuxiaoran@gmail.com> writes:
>> Hmm, how about first checking if any invalidated shared messages have been
>> accepted, then rechecking the tuple's visibility?
>> If there is no invalidated shared message accepted during
>> 'toast_flatten_tuple',
>> there is no need to do then visibility check, then it can save several
>> CPU cycles.

> Meh, I'd just as soon not add the additional dependency/risk of bugs.
> This is an expensive and seldom-taken code path, so I don't think
> shaving a few cycles is really important.

It occurred to me that this idea might be more interesting if we
could encapsulate it right into systable_recheck_tuple: something
like having systable_beginscan capture the current
SharedInvalidMessageCounter and save it in the SysScanDesc struct,
then compare in systable_recheck_tuple to possibly short-circuit
that work. This'd eliminate one of the main bug hazards in the
idea, namely that you might capture SharedInvalidMessageCounter too
late, after something's already happened. However, the whole idea
only works for catalogs that have catcaches, and the other users of
systable_recheck_tuple are interested in pg_depend which doesn't.
So that put a damper on my enthusiasm for the idea.

regards, tom lane

Re: Recovering from detoast-related catcache invalidations

From

Noah Misch

Date:

25 September 2024, 00:20:36

On Sun, Jan 14, 2024 at 12:14:11PM -0800, Noah Misch wrote:
> On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:
> > Oh!  After nosing around a bit more I remembered systable_recheck_tuple,
> > which is meant for exactly this purpose.  So v4 attached.
> 
> systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
> general proxy for invalidation messages.  The commit for $SUBJECT (ad98fb1)
> doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
> part will change again before the heap_inplace_update() story is over
> (https://postgr.es/m/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com).

Commit f9f47f0 (2024-06-27) addressed inplace updates here.

Re: Recovering from detoast-related catcache invalidations

From

Heikki Linnakangas

Date:

13 December 2024, 17:20:10

On 25/09/2024 00:20, Noah Misch wrote:
> On Sun, Jan 14, 2024 at 12:14:11PM -0800, Noah Misch wrote:
>> On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:
>>> Oh!  After nosing around a bit more I remembered systable_recheck_tuple,
>>> which is meant for exactly this purpose.  So v4 attached.
>>
>> systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
>> general proxy for invalidation messages.  The commit for $SUBJECT (ad98fb1)
>> doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
>> part will change again before the heap_inplace_update() story is over
>> (https://postgr.es/m/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com).
> 
> Commit f9f47f0 (2024-06-27) addressed inplace updates here.

I started to wonder if there's still an issue with catcache list 
entries. The code to build a CatCList looks like this:

SearchCatCacheList()
   systable_beginscan()
   while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) {
     ct = CatalogCacheCreateEntry(ntp)
     if (ct == NULL)
     {
       /* 'ntp' was concurrently invalidated, start all over */
     }
   }
   systable_endscan();

   /* create CatCList entry */

CatalogCacheCreateEntry() can accept catcache invalidations when it 
opens the toast table, and it now has recheck logic to detect the case 
that the tuple it's processing (ntp) is invalidated. However, isn't it 
also possible that it accepts an invalidation message for a tuple that 
we had processed in an earlier iteration of the loop? Or that a new 
catalog tuple was inserted that should be part of the list we're building?

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Recovering from detoast-related catcache invalidations

From

Tom Lane

Date:

13 December 2024, 18:30:22

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> CatalogCacheCreateEntry() can accept catcache invalidations when it 
> opens the toast table, and it now has recheck logic to detect the case 
> that the tuple it's processing (ntp) is invalidated. However, isn't it 
> also possible that it accepts an invalidation message for a tuple that 
> we had processed in an earlier iteration of the loop? Or that a new 
> catalog tuple was inserted that should be part of the list we're building?

The expectation is that the list will be built and returned to the
caller, but it's already marked as stale so it will be rebuilt
on next request.

We could consider putting a loop around that, but (a) it might loop a
lot of times, and (b) returning a stale list isn't much different from
the situation where the list-invalidating event arrives a nanosecond
after we finish rather than a nanosecond before.  Ultimately it's the
caller's responsibility that the returned list be consistent enough
for its purposes.  It might achieve that by first taking a lock on a
related table, for example.

            regards, tom lane

Re: Recovering from detoast-related catcache invalidations

From

Noah Misch

Date:

08 January, 00:56:53

On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:
> On 14/12/2024 02:06, Heikki Linnakangas wrote:
> > Ok, I missed that. It does not handle the 2nd scenario though: If a new
> > catalog tuple is concurrently inserted that should be part of the list,
> > it is missed.

> > Attached is an injection point test case to reproduce that. If you
> > change the test so that the function's body is shorter, so that it's not
> > toasted, the test passes.

Nice discovery.

> I'm thinking of the attached to fix this. It changes the strategy for
> detecting concurrent cache invalidations. Instead of the "recheck" mechanism
> that was introduced in commit ad98fb1422, keep a stack of "build
> in-progress" entries, and CatCacheInvalidate() invalidate those
> "in-progress" entries in addition to the actual CatCTup and CatCList
> entries.
> 
> My first attempt was to insert the CatCTup or CatCList entry to the catcache
> before starting to build it, marked with a flag to indicate that the entry
> isn't fully built yet. But when I started to write that it got pretty
> invasive, and it felt easier to add another structure to hold the
> in-progress entries instead.

That's similar to how relcache has been doing it (in_progress_list).  I see no
problem applying that technique here.

> not sure I want to
> commit the test with the injection point, but it's useful now to demonstrate
> the bug.)

I'd err on the side of including it.  Apart from some copied comments, the
test looks ready.

On Wed, Dec 25, 2024 at 11:27:38AM +0200, Heikki Linnakangas wrote:
> +++ b/src/test/modules/test_misc/t/007_bugs.pl

test_misc/t/007_bugs.pl could be home to almost anything.  How about naming it
007_catcache_inval.pl?

> @@ -744,6 +770,13 @@ ResetCatalogCache(CatCache *cache)
>  #endif
>          }
>      }
> +
> +    /* Also invalidate any entries that are being built */
> +    for (CatCCreating *e = catcache_creating_stack; e != NULL; e = e->next)
> +    {
> +        if (e->cache == cache)
> +            e->dead = true;
> +    }
>  }

With debug_discard_caches=1, "make check" hangs early with some INSERT using
100% CPU.  The new test file does likewise.  I bet this needs a special case
to short-circuit debug_discard_caches, like RelationCacheInvalidate() has.

> @@ -1665,6 +1698,8 @@ SearchCatCacheList(CatCache *cache,
>      HeapTuple    ntp;
>      MemoryContext oldcxt;
>      int            i;
> +    volatile CatCCreating creating_list;

You could drop the volatile by copying catcache_creating_stack to an
additional var that you reference after longjmp, instead of referencing
creating_list.next after longjmp.  Likewise for the instance of volatile in
CatalogCacheCreateEntry().
https://pubs.opengroup.org/onlinepubs/9799919799/functions/longjmp.html has
the rules.  Using volatile is fine, though.

> +    bool        first_iter = true;

This is okay as non-volatile, but it could just as easily move inside the
PG_TRY.

> @@ -2076,38 +2118,33 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,

> +            PG_TRY();
>              {
> -                matches = equalTuple(before, ntp);
> -                heap_freetuple(before);
> +                dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);

gcc 4.8.5 warns:

catcache.c: In function ‘CatalogCacheCreateEntry’:
catcache.c:2159:29: warning: ‘dtp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   ct->tuple.t_tableOid = dtp->t_tableOid;

I remember some commit of a gcc 4.x warning fix in a recent year, and we do
have buildfarm representation.  Consider silencing it.

The rest looks good.  Thanks.

Re: Recovering from detoast-related catcache invalidations

From

Andres Freund

Date:

26 March, 14:21:43

Hi,

On 2025-01-14 15:13:21 +0200, Heikki Linnakangas wrote:
> Committed with those fixes. Thanks for the review!

The test doesn't seem entirely stable. E.g.
https://cirrus-ci.com/task/6166374147424256
failed spuriously:

[08:52:06.822](0.002s) # issuing query 1 via background psql: 
#     SELECT injection_points_set_local();
#     SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
[08:52:06.851](0.029s) # results query 1:
# {
#   'stderr' => 'background_psql: QUERY_SEPARATOR 1:
# ',
#   'stdout' => '
# 
# background_psql: QUERY_SEPARATOR 1:
# '
# }
[08:52:06.893](0.042s) # issuing query 1 via background psql: 
#     SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
#     SELECT injection_points_detach('catcache-list-miss-systable-scan-started');
[08:52:06.897](0.004s) # pump_until: process terminated unexpectedly when searching for "(?^:(^|\n)background_psql:
QUERY_SEPARATOR1:\r?\n)" with stream: ""
 
process ended prematurely at /tmp/cirrus-ci-build/src/test/perl/PostgreSQL/Test/Utils.pm line 440.


2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] ERROR:  could not find injection
pointcatcache-list-miss-systable-scan-started to wake up
 
2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] STATEMENT:  SELECT
injection_points_wakeup('catcache-list-miss-systable-scan-started');

Greetings,

Andres Freund