Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect |
Date | |
Msg-id | 462D2FFC.2080501@enterprisedb.com Whole thread Raw |
In response to | Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
|
List | pgsql-bugs |
Tom Lane wrote: > I wrote: >> Heikki Linnakangas <heikki@enterprisedb.com> writes: >>> Dave, would you please create a new binary with the attached patch? And >>> LOCK_DEBUG and assertions and debug enabled. > >> Also, it would be worth adding "lockmode" to the set of things printed >> by the panic message in the patch I sent earlier. > > Also: as long as we are building a custom-hacked executable to probe > into this, let's hack it to not remove the 2PC state file, so we can > double check what's really in there. I believe what you'd need to > remove is the RemoveTwoPhaseFile calls at twophase.c line 1583 (where > it thinks it's "stale") and xact.c line 4223 (where it's replaying a > XLOG_XACT_COMMIT_PREPARED WAL record). Yeah, sounds like a good idea. Patch attached that incorporates all the ideas this far: 1. More verbose PANIC message, including lockmode 2. More debug info in AtPrepare_Locks. I even put a DumpLocks call in it, that should give us a good picture of what's in the lock structures at the time of commit 3. Instead of removing twophase-file in recovery, rename it to *.removed. (it will be ignored by postgresql after that, because it doesn't follow the normal naming rules of 2PC state files) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/twophase.c =================================================================== RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/twophase.c,v retrieving revision 1.25.2.1 diff -c -r1.25.2.1 twophase.c *** src/backend/access/transam/twophase.c 13 Feb 2007 19:39:48 -0000 1.25.2.1 --- src/backend/access/transam/twophase.c 23 Apr 2007 21:58:29 -0000 *************** *** 1258,1263 **** --- 1258,1276 ---- char path[MAXPGPATH]; TwoPhaseFilePath(path, xid); + + if (InRecovery) + { + char newpath[MAXPGPATH+10]; + sprintf(newpath, "%s.removed", path); + if(rename(path, newpath)) + if (errno != ENOENT || giveWarning) + ereport(WARNING, + (errcode_for_file_access(), + errmsg("could not remove two-phase state file \"%s\": %m", + path))); + } + else if (unlink(path)) if (errno != ENOENT || giveWarning) ereport(WARNING, Index: src/backend/storage/lmgr/lock.c =================================================================== RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/lmgr/lock.c,v retrieving revision 1.174 diff -c -r1.174 lock.c *** src/backend/storage/lmgr/lock.c 4 Oct 2006 00:29:57 -0000 1.174 --- src/backend/storage/lmgr/lock.c 23 Apr 2007 21:52:23 -0000 *************** *** 1796,1801 **** --- 1796,1817 ---- HASH_SEQ_STATUS status; LOCALLOCK *locallock; + #ifdef LOCK_DEBUG + { + int i; + /* + * Must grab LWLocks in partition-number order to avoid LWLock deadlock. + */ + for (i = 0; i < NUM_LOCK_PARTITIONS; i++) + LWLockAcquire(FirstLockMgrLock + i, LW_SHARED); + + DumpLocks(MyProc); + + for (i = 0; i < NUM_LOCK_PARTITIONS; i++) + LWLockRelease(FirstLockMgrLock + i); + } + #endif + /* * We don't need to touch shared memory for this --- all the necessary * state information is in the locallock table. *************** *** 1830,1835 **** --- 1846,1854 ---- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot PREPARE a transaction that has operated on temporary tables"))); + PROCLOCK_PRINT("AtPrepare_Locks", locallock->proclock); + LOCK_PRINT("AtPrepare_Locks", locallock->lock, locallock->tag.mode); + /* * Create a 2PC record. */ *************** *** 2430,2436 **** HASH_FIND, NULL); if (!lock) ! elog(PANIC, "failed to re-find shared lock object"); /* * Re-find the proclock object (ditto). --- 2449,2462 ---- HASH_FIND, NULL); if (!lock) ! elog(PANIC, "failed to re-find shared lock object: %u %u %u %u %u %u, mode %s", ! locktag->locktag_field1, ! locktag->locktag_field2, ! locktag->locktag_field3, ! locktag->locktag_field4, ! locktag->locktag_type, ! locktag->locktag_lockmethodid, ! LockMethods[lockmethodid]->lockModeNames[lockmode]); /* * Re-find the proclock object (ditto).
pgsql-bugs by date: