Re: Postgresql 8.4.1 segfault, backtrace - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Postgresql 8.4.1 segfault, backtrace
Date
Msg-id 23825.1253901694@sss.pgh.pa.us
Whole thread Raw
In response to Re: Postgresql 8.4.1 segfault, backtrace  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Postgresql 8.4.1 segfault, backtrace  (Richard Neill <rn214@hermes.cam.ac.uk>)
List pgsql-bugs
I wrote:
> I'll get you a real fix as soon as I can, but might not be till
> tomorrow.

The attached patch (against 8.4.x) fixes the problem as far as I can
tell.  Please test.

            regards, tom lane

Index: src/backend/utils/cache/relcache.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.287
diff -c -r1.287 relcache.c
*** src/backend/utils/cache/relcache.c    11 Jun 2009 14:49:05 -0000    1.287
--- src/backend/utils/cache/relcache.c    25 Sep 2009 17:32:02 -0000
***************
*** 1386,1392 ****
       *
       * The data we insert here is pretty incomplete/bogus, but it'll serve to
       * get us launched.  RelationCacheInitializePhase2() will read the real
!      * data from pg_class and replace what we've done here.
       */
      relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE);

--- 1386,1394 ----
       *
       * The data we insert here is pretty incomplete/bogus, but it'll serve to
       * get us launched.  RelationCacheInitializePhase2() will read the real
!      * data from pg_class and replace what we've done here.  Note in particular
!      * that relowner is left as zero; this cues RelationCacheInitializePhase2
!      * that the real data isn't there yet.
       */
      relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE);

***************
*** 2603,2619 ****
       * rows and replace the fake entries with them. Also, if any of the
       * relcache entries have rules or triggers, load that info the hard way
       * since it isn't recorded in the cache file.
       */
      hash_seq_init(&status, RelationIdCache);

      while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
      {
          Relation    relation = idhentry->reldesc;

          /*
           * If it's a faked-up entry, read the real pg_class tuple.
           */
!         if (needNewCacheFile && relation->rd_isnailed)
          {
              HeapTuple    htup;
              Form_pg_class relp;
--- 2605,2635 ----
       * rows and replace the fake entries with them. Also, if any of the
       * relcache entries have rules or triggers, load that info the hard way
       * since it isn't recorded in the cache file.
+      *
+      * Whenever we access the catalogs to read data, there is a possibility
+      * of a shared-inval cache flush causing relcache entries to be removed.
+      * Since hash_seq_search only guarantees to still work after the *current*
+      * entry is removed, it's unsafe to continue the hashtable scan afterward.
+      * We handle this by restarting the scan from scratch after each access.
+      * This is theoretically O(N^2), but the number of entries that actually
+      * need to be fixed is small enough that it doesn't matter.
       */
      hash_seq_init(&status, RelationIdCache);

      while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
      {
          Relation    relation = idhentry->reldesc;
+         bool        restart = false;
+
+         /*
+          * Make sure *this* entry doesn't get flushed while we work with it.
+          */
+         RelationIncrementReferenceCount(relation);

          /*
           * If it's a faked-up entry, read the real pg_class tuple.
           */
!         if (relation->rd_rel->relowner == InvalidOid)
          {
              HeapTuple    htup;
              Form_pg_class relp;
***************
*** 2630,2636 ****
               * Copy tuple to relation->rd_rel. (See notes in
               * AllocateRelationDesc())
               */
-             Assert(relation->rd_rel != NULL);
              memcpy((char *) relation->rd_rel, (char *) relp, CLASS_TUPLE_SIZE);

              /* Update rd_options while we have the tuple */
--- 2646,2651 ----
***************
*** 2639,2660 ****
              RelationParseRelOptions(relation, htup);

              /*
!              * Also update the derived fields in rd_att.
               */
!             relation->rd_att->tdtypeid = relp->reltype;
!             relation->rd_att->tdtypmod = -1;    /* unnecessary, but... */
!             relation->rd_att->tdhasoid = relp->relhasoids;

              ReleaseSysCache(htup);
          }

          /*
           * Fix data that isn't saved in relcache cache file.
           */
          if (relation->rd_rel->relhasrules && relation->rd_rules == NULL)
              RelationBuildRuleLock(relation);
          if (relation->rd_rel->relhastriggers && relation->trigdesc == NULL)
              RelationBuildTriggers(relation);
      }

      /*
--- 2654,2710 ----
              RelationParseRelOptions(relation, htup);

              /*
!              * Check the values in rd_att were set up correctly.  (We cannot
!              * just copy them over now: formrdesc must have set up the
!              * rd_att data correctly to start with, because it may already
!              * have been copied into one or more catcache entries.)
               */
!             Assert(relation->rd_att->tdtypeid == relp->reltype);
!             Assert(relation->rd_att->tdtypmod == -1);
!             Assert(relation->rd_att->tdhasoid == relp->relhasoids);

              ReleaseSysCache(htup);
+
+             /* relowner had better be OK now, else we'll loop forever */
+             if (relation->rd_rel->relowner == InvalidOid)
+                 elog(ERROR, "invalid relowner in pg_class entry for \"%s\"",
+                      RelationGetRelationName(relation));
+
+             restart = true;
          }

          /*
           * Fix data that isn't saved in relcache cache file.
+          *
+          * relhasrules or relhastriggers could possibly be wrong or out of
+          * date.  If we don't actually find any rules or triggers, clear the
+          * local copy of the flag so that we don't get into an infinite loop
+          * here.  We don't make any attempt to fix the pg_class entry, though.
           */
          if (relation->rd_rel->relhasrules && relation->rd_rules == NULL)
+         {
              RelationBuildRuleLock(relation);
+             if (relation->rd_rules == NULL)
+                 relation->rd_rel->relhasrules = false;
+             restart = true;
+         }
          if (relation->rd_rel->relhastriggers && relation->trigdesc == NULL)
+         {
              RelationBuildTriggers(relation);
+             if (relation->trigdesc == NULL)
+                 relation->rd_rel->relhastriggers = false;
+             restart = true;
+         }
+
+         /* Release hold on the relation */
+         RelationDecrementReferenceCount(relation);
+
+         /* Now, restart the hashtable scan if needed */
+         if (restart)
+         {
+             hash_seq_term(&status);
+             hash_seq_init(&status, RelationIdCache);
+         }
      }

      /*

pgsql-bugs by date:

Previous
From: "smuffy"
Date:
Subject: BUG #5082: I can't get logfile( achive )
Next
From: Tom Lane
Date:
Subject: Re: Postgresql 8.4.1 segfault, backtrace