Re: BUG #5412: test case produced, possible race condition. - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #5412: test case produced, possible race condition.
Date
Msg-id 4BC5A28A.5060902@enterprisedb.com
Whole thread Raw
In response to Re: BUG #5412: test case produced, possible race condition.  (Rusty Conover <rconover@infogears.com>)
Responses Re: BUG #5412: test case produced, possible race condition.  (Rusty Conover <rconover@infogears.com>)
Re: BUG #5412: test case produced, possible race condition.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Rusty Conover wrote:
> It seems like this is a race condition cause by the system catalog cache not being locked properly. I've included a
perlscript below that causes the crash on my box consistently. 
>
> The script forks two different types of processes:
>
> #1 - begin transaction, create a few temp tables and analyze them in a transaction, commit (running in database
foobar_1)
> #2 - begin transaction, truncate table, insert records into table from select in a transaction, commit (running in
databasefoobar_2) 
>
> I setup the process to have 10 instances of task #1 and 1 instance of task #2.
>
> Running this script causes the crash of postgres within seconds on my box.

Thanks, that script crashes on my laptop too, with assertions enabled.

According to the comments above RelationClearRelation(), if it's called
with 'rebuild=true', the caller should hold a lock on the relation, i.e
refcnt > 0. That's not the case in RelationFlushRelation() when it
rebuilds a new relcache entry.

Attached patch should fix that, by incrementing the reference count
while the entry is rebuilt. It also adds an Assertion in
RelationClearRelation() to check that the refcnt is indeed > 0.
Comments?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/utils/cache/relcache.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.287.2.3
diff -c -r1.287.2.3 relcache.c
*** src/backend/utils/cache/relcache.c    13 Jan 2010 23:07:15 -0000    1.287.2.3
--- src/backend/utils/cache/relcache.c    14 Apr 2010 11:09:23 -0000
***************
*** 1773,1778 ****
--- 1773,1781 ----
  {
      Oid            old_reltype = relation->rd_rel->reltype;

+     Assert((rebuild && relation->rd_refcnt > 0) ||
+            (!rebuild && relation->rd_refcnt == 0));
+
      /*
       * Make sure smgr and lower levels close the relation's files, if they
       * weren't closed already.  If the relation is not getting deleted, the
***************
*** 1968,1975 ****
  static void
  RelationFlushRelation(Relation relation)
  {
-     bool        rebuild;
-
      if (relation->rd_createSubid != InvalidSubTransactionId ||
          relation->rd_newRelfilenodeSubid != InvalidSubTransactionId)
      {
--- 1971,1976 ----
***************
*** 1978,1994 ****
           * forget the "new" status of the relation, which is a useful
           * optimization to have.  Ditto for the new-relfilenode status.
           */
!         rebuild = true;
      }
      else
      {
          /*
           * Pre-existing rels can be dropped from the relcache if not open.
           */
!         rebuild = !RelationHasReferenceCountZero(relation);
      }
-
-     RelationClearRelation(relation, rebuild);
  }

  /*
--- 1979,1996 ----
           * forget the "new" status of the relation, which is a useful
           * optimization to have.  Ditto for the new-relfilenode status.
           */
!         RelationIncrementReferenceCount(relation);
!         RelationClearRelation(relation, true);
!         RelationDecrementReferenceCount(relation);
      }
      else
      {
          /*
           * Pre-existing rels can be dropped from the relcache if not open.
           */
!         bool rebuild = !RelationHasReferenceCountZero(relation);
!         RelationClearRelation(relation, rebuild);
      }
  }

  /*

pgsql-bugs by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: BUG #5419: Default parameters in PLPGSQL functions skipping every other value in pgAdmin view
Next
From: Craig Ringer
Date:
Subject: Re: Bug in CREATE FUNCTION with character type (CONFIRMED BUG)