Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date
Msg-id 462D2FFC.2080501@enterprisedb.com
Whole thread Raw
In response to Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
List pgsql-bugs
Tom Lane wrote:
> I wrote:
>> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>>> Dave, would you please create a new binary with the attached patch? And
>>> LOCK_DEBUG and assertions and debug enabled.
>
>> Also, it would be worth adding "lockmode" to the set of things printed
>> by the panic message in the patch I sent earlier.
>
> Also: as long as we are building a custom-hacked executable to probe
> into this, let's hack it to not remove the 2PC state file, so we can
> double check what's really in there.  I believe what you'd need to
> remove is the RemoveTwoPhaseFile calls at twophase.c line 1583 (where
> it thinks it's "stale") and xact.c line 4223 (where it's replaying a
> XLOG_XACT_COMMIT_PREPARED WAL record).

Yeah, sounds like a good idea.

Patch attached that incorporates all the ideas this far:

1. More verbose PANIC message, including lockmode
2. More debug info in AtPrepare_Locks. I even put a DumpLocks call in
it, that should give us a good picture of what's in the lock structures
at the time of commit
3. Instead of removing twophase-file in recovery, rename it to
*.removed. (it will be ignored by postgresql after that, because it
doesn't follow the normal naming rules of 2PC state files)

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.25.2.1
diff -c -r1.25.2.1 twophase.c
*** src/backend/access/transam/twophase.c    13 Feb 2007 19:39:48 -0000    1.25.2.1
--- src/backend/access/transam/twophase.c    23 Apr 2007 21:58:29 -0000
***************
*** 1258,1263 ****
--- 1258,1276 ----
      char        path[MAXPGPATH];

      TwoPhaseFilePath(path, xid);
+
+     if (InRecovery)
+     {
+         char newpath[MAXPGPATH+10];
+         sprintf(newpath, "%s.removed", path);
+         if(rename(path, newpath))
+             if (errno != ENOENT || giveWarning)
+                 ereport(WARNING,
+                         (errcode_for_file_access(),
+                          errmsg("could not remove two-phase state file \"%s\": %m",
+                                 path)));
+     }
+     else
      if (unlink(path))
          if (errno != ENOENT || giveWarning)
              ereport(WARNING,
Index: src/backend/storage/lmgr/lock.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/lmgr/lock.c,v
retrieving revision 1.174
diff -c -r1.174 lock.c
*** src/backend/storage/lmgr/lock.c    4 Oct 2006 00:29:57 -0000    1.174
--- src/backend/storage/lmgr/lock.c    23 Apr 2007 21:52:23 -0000
***************
*** 1796,1801 ****
--- 1796,1817 ----
      HASH_SEQ_STATUS status;
      LOCALLOCK  *locallock;

+ #ifdef LOCK_DEBUG
+  {
+     int i;
+     /*
+      * Must grab LWLocks in partition-number order to avoid LWLock deadlock.
+      */
+     for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
+         LWLockAcquire(FirstLockMgrLock + i, LW_SHARED);
+
+     DumpLocks(MyProc);
+
+     for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
+         LWLockRelease(FirstLockMgrLock + i);
+  }
+ #endif
+
      /*
       * We don't need to touch shared memory for this --- all the necessary
       * state information is in the locallock table.
***************
*** 1830,1835 ****
--- 1846,1854 ----
                      (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                       errmsg("cannot PREPARE a transaction that has operated on temporary tables")));

+         PROCLOCK_PRINT("AtPrepare_Locks", locallock->proclock);
+         LOCK_PRINT("AtPrepare_Locks", locallock->lock, locallock->tag.mode);
+
          /*
           * Create a 2PC record.
           */
***************
*** 2430,2436 ****
                                                  HASH_FIND,
                                                  NULL);
      if (!lock)
!         elog(PANIC, "failed to re-find shared lock object");

      /*
       * Re-find the proclock object (ditto).
--- 2449,2462 ----
                                                  HASH_FIND,
                                                  NULL);
      if (!lock)
!          elog(PANIC, "failed to re-find shared lock object: %u %u %u %u %u %u, mode %s",
!              locktag->locktag_field1,
!              locktag->locktag_field2,
!              locktag->locktag_field3,
!              locktag->locktag_field4,
!              locktag->locktag_type,
!              locktag->locktag_lockmethodid,
!              LockMethods[lockmethodid]->lockModeNames[lockmode]);

      /*
       * Re-find the proclock object (ditto).

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Next
From: "Daniele Varrazzo"
Date:
Subject: ILIKE fails with accented letters on utf8 locale