Re: Recovering from detoast-related catcache invalidations - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Recovering from detoast-related catcache invalidations
Date
Msg-id 20250107215653.82.nmisch@google.com
Whole thread Raw
In response to Re: Recovering from detoast-related catcache invalidations  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:
> On 14/12/2024 02:06, Heikki Linnakangas wrote:
> > Ok, I missed that. It does not handle the 2nd scenario though: If a new
> > catalog tuple is concurrently inserted that should be part of the list,
> > it is missed.

> > Attached is an injection point test case to reproduce that. If you
> > change the test so that the function's body is shorter, so that it's not
> > toasted, the test passes.

Nice discovery.

> I'm thinking of the attached to fix this. It changes the strategy for
> detecting concurrent cache invalidations. Instead of the "recheck" mechanism
> that was introduced in commit ad98fb1422, keep a stack of "build
> in-progress" entries, and CatCacheInvalidate() invalidate those
> "in-progress" entries in addition to the actual CatCTup and CatCList
> entries.
> 
> My first attempt was to insert the CatCTup or CatCList entry to the catcache
> before starting to build it, marked with a flag to indicate that the entry
> isn't fully built yet. But when I started to write that it got pretty
> invasive, and it felt easier to add another structure to hold the
> in-progress entries instead.

That's similar to how relcache has been doing it (in_progress_list).  I see no
problem applying that technique here.

> not sure I want to
> commit the test with the injection point, but it's useful now to demonstrate
> the bug.)

I'd err on the side of including it.  Apart from some copied comments, the
test looks ready.

On Wed, Dec 25, 2024 at 11:27:38AM +0200, Heikki Linnakangas wrote:
> +++ b/src/test/modules/test_misc/t/007_bugs.pl

test_misc/t/007_bugs.pl could be home to almost anything.  How about naming it
007_catcache_inval.pl?

> @@ -744,6 +770,13 @@ ResetCatalogCache(CatCache *cache)
>  #endif
>          }
>      }
> +
> +    /* Also invalidate any entries that are being built */
> +    for (CatCCreating *e = catcache_creating_stack; e != NULL; e = e->next)
> +    {
> +        if (e->cache == cache)
> +            e->dead = true;
> +    }
>  }

With debug_discard_caches=1, "make check" hangs early with some INSERT using
100% CPU.  The new test file does likewise.  I bet this needs a special case
to short-circuit debug_discard_caches, like RelationCacheInvalidate() has.

> @@ -1665,6 +1698,8 @@ SearchCatCacheList(CatCache *cache,
>      HeapTuple    ntp;
>      MemoryContext oldcxt;
>      int            i;
> +    volatile CatCCreating creating_list;

You could drop the volatile by copying catcache_creating_stack to an
additional var that you reference after longjmp, instead of referencing
creating_list.next after longjmp.  Likewise for the instance of volatile in
CatalogCacheCreateEntry().
https://pubs.opengroup.org/onlinepubs/9799919799/functions/longjmp.html has
the rules.  Using volatile is fine, though.

> +    bool        first_iter = true;

This is okay as non-volatile, but it could just as easily move inside the
PG_TRY.

> @@ -2076,38 +2118,33 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,

> +            PG_TRY();
>              {
> -                matches = equalTuple(before, ntp);
> -                heap_freetuple(before);
> +                dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);

gcc 4.8.5 warns:

catcache.c: In function ‘CatalogCacheCreateEntry’:
catcache.c:2159:29: warning: ‘dtp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   ct->tuple.t_tableOid = dtp->t_tableOid;

I remember some commit of a gcc 4.x warning fix in a recent year, and we do
have buildfarm representation.  Consider silencing it.


The rest looks good.  Thanks.



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Small patch to use pqMsg_Query instead of `Q`
Next
From: Ilia Evdokimov
Date:
Subject: Re: Sample rate added to pg_stat_statements