Re: BUG #5412: test case produced, possible race condition. - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: BUG #5412: test case produced, possible race condition. |
Date | |
Msg-id | 4BC5A28A.5060902@enterprisedb.com Whole thread Raw |
In response to | Re: BUG #5412: test case produced, possible race condition. (Rusty Conover <rconover@infogears.com>) |
Responses |
Re: BUG #5412: test case produced, possible race condition.
Re: BUG #5412: test case produced, possible race condition. |
List | pgsql-bugs |
Rusty Conover wrote: > It seems like this is a race condition cause by the system catalog cache not being locked properly. I've included a perlscript below that causes the crash on my box consistently. > > The script forks two different types of processes: > > #1 - begin transaction, create a few temp tables and analyze them in a transaction, commit (running in database foobar_1) > #2 - begin transaction, truncate table, insert records into table from select in a transaction, commit (running in databasefoobar_2) > > I setup the process to have 10 instances of task #1 and 1 instance of task #2. > > Running this script causes the crash of postgres within seconds on my box. Thanks, that script crashes on my laptop too, with assertions enabled. According to the comments above RelationClearRelation(), if it's called with 'rebuild=true', the caller should hold a lock on the relation, i.e refcnt > 0. That's not the case in RelationFlushRelation() when it rebuilds a new relcache entry. Attached patch should fix that, by incrementing the reference count while the entry is rebuilt. It also adds an Assertion in RelationClearRelation() to check that the refcnt is indeed > 0. Comments? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/utils/cache/relcache.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v retrieving revision 1.287.2.3 diff -c -r1.287.2.3 relcache.c *** src/backend/utils/cache/relcache.c 13 Jan 2010 23:07:15 -0000 1.287.2.3 --- src/backend/utils/cache/relcache.c 14 Apr 2010 11:09:23 -0000 *************** *** 1773,1778 **** --- 1773,1781 ---- { Oid old_reltype = relation->rd_rel->reltype; + Assert((rebuild && relation->rd_refcnt > 0) || + (!rebuild && relation->rd_refcnt == 0)); + /* * Make sure smgr and lower levels close the relation's files, if they * weren't closed already. If the relation is not getting deleted, the *************** *** 1968,1975 **** static void RelationFlushRelation(Relation relation) { - bool rebuild; - if (relation->rd_createSubid != InvalidSubTransactionId || relation->rd_newRelfilenodeSubid != InvalidSubTransactionId) { --- 1971,1976 ---- *************** *** 1978,1994 **** * forget the "new" status of the relation, which is a useful * optimization to have. Ditto for the new-relfilenode status. */ ! rebuild = true; } else { /* * Pre-existing rels can be dropped from the relcache if not open. */ ! rebuild = !RelationHasReferenceCountZero(relation); } - - RelationClearRelation(relation, rebuild); } /* --- 1979,1996 ---- * forget the "new" status of the relation, which is a useful * optimization to have. Ditto for the new-relfilenode status. */ ! RelationIncrementReferenceCount(relation); ! RelationClearRelation(relation, true); ! RelationDecrementReferenceCount(relation); } else { /* * Pre-existing rels can be dropped from the relcache if not open. */ ! bool rebuild = !RelationHasReferenceCountZero(relation); ! RelationClearRelation(relation, rebuild); } } /*
pgsql-bugs by date: