Cache-flush stress testing - Mailing list pgsql-hackers

From Tom Lane
Subject Cache-flush stress testing
Date
Msg-id 9648.1137708200@sss.pgh.pa.us
Whole thread Raw
Responses Re: Cache-flush stress testing  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-hackers
I've completed a round of stress testing the system for vulnerabilities
to unexpected cache flush events (relcache, catcache, or typcache
entries disappearing while in use).  I'm pleased to report that the 8.1
branch now passes all available regression tests (main, contrib, pl)
with CLOBBER_CACHE_ALWAYS defined as per the attached patch.
I have not had the patience to run a full regression cycle with
CLOBBER_CACHE_RECURSIVELY (I estimate that would take over a week on the
fastest machine I have) but I have gotten through the first dozen or so
tests, and I doubt that completing the full set would find anything not
found by CLOBBER_CACHE_ALWAYS.

HEAD is still broken pending resolution of the lookup_rowtype_tupdesc()
business.  8.0 should be OK but I haven't actually tested it.

I'm still bothered by the likelihood that there are cache-flush bugs in
code paths that are not exercised by the regression tests.  The
CLOBBER_CACHE patch is far too slow to consider enabling on any regular
basis, but it seems that throwing in cache flushes at random intervals,
as in the test program I posted here:
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00244.php
doesn't provide very good test coverage.  Has anyone got any ideas about
better ways to locate such bugs?

        regards, tom lane


Index: inval.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/cache/inval.c,v
retrieving revision 1.74
diff -c -r1.74 inval.c
*** inval.c    22 Nov 2005 18:17:24 -0000    1.74
--- inval.c    19 Jan 2006 21:47:07 -0000
***************
*** 625,630 ****
--- 625,660 ---- {     ReceiveSharedInvalidMessages(LocalExecuteInvalidationMessage,
InvalidateSystemCaches);
+ 
+     /*
+      * Test code to force cache flushes anytime a flush could happen.
+      *
+      * If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a
+      * fairly thorough test that the system contains no cache-flush hazards.
+      * However, it also makes the system unbelievably slow --- the regression
+      * tests take about 100 times longer than normal.
+      *
+      * If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY.
+      * This slows things by at least a factor of 10000, so I wouldn't suggest
+      * trying to run the entire regression tests that way.  It's useful to
+      * try a few simple tests, to make sure that cache reload isn't subject
+      * to internal cache-flush hazards, but after you've done a few thousand
+      * recursive reloads it's unlikely you'll learn more.
+      */
+ #if defined(CLOBBER_CACHE_ALWAYS)
+     {
+         static bool in_recursion = false;
+ 
+         if (!in_recursion)
+         {
+             in_recursion = true;
+             InvalidateSystemCaches();
+             in_recursion = false;
+         }
+     }
+ #elif defined(CLOBBER_CACHE_RECURSIVELY)
+     InvalidateSystemCaches();
+ #endif }  /*


pgsql-hackers by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: un-vacuum?
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Cache-flush stress testing