Adding some error context for lock wait failures - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Adding some error context for lock wait failures |
Date | |
Msg-id | 1602369.1752167154@sss.pgh.pa.us Whole thread Raw |
List | pgsql-hackers |
I noted a complaint [1] about how hard it is to debug unforeseen lock-timeout failures: we give no details about what we were waiting for. It's not hard to improve that situation, at least to the extent of printing numeric locktag details similar to what you get in deadlock reports. (It'd be nice to give object names, but just as with deadlocks, incurring any additional lock acquisitions here seems too scary.) The attached patch will produce reports like regression=# begin; BEGIN regression=*# lock table tenk1; ^CCancel request sent ERROR: canceling statement due to user request CONTEXT: waiting for AccessExclusiveLock on relation 77382 of database 77348 regression=!# abort; ROLLBACK regression=# set lock_timeout TO '1s'; SET regression=# begin; BEGIN regression=*# lock table tenk1; ERROR: canceling statement due to lock timeout CONTEXT: waiting for AccessExclusiveLock on relation 77382 of database 77348 and then the user can manually look up the object's identity. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/CAKE1AiY17RgcKCFba1N6Sz6SjHqSqvq%2BcfKWBfyKFEjT-L%2Bqkg%40mail.gmail.com diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 62f3471448e..01b752830fb 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -415,6 +415,7 @@ static void GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner); static void BeginStrongLockAcquire(LOCALLOCK *locallock, uint32 fasthashcode); static void FinishStrongLockAcquire(void); static ProcWaitStatus WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner); +static void waitonlock_error_callback(void *arg); static void ReleaseLockIfHeld(LOCALLOCK *locallock, bool sessionLock); static void LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent); static bool UnGrantLock(LOCK *lock, LOCKMODE lockmode, @@ -1931,6 +1932,7 @@ static ProcWaitStatus WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner) { ProcWaitStatus result; + ErrorContextCallback waiterrcontext; TRACE_POSTGRESQL_LOCK_WAIT_START(locallock->tag.lock.locktag_field1, locallock->tag.lock.locktag_field2, @@ -1939,6 +1941,12 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner) locallock->tag.lock.locktag_type, locallock->tag.mode); + /* Setup error traceback support for ereport() */ + waiterrcontext.callback = waitonlock_error_callback; + waiterrcontext.arg = (void *) locallock; + waiterrcontext.previous = error_context_stack; + error_context_stack = &waiterrcontext; + /* adjust the process title to indicate that it's waiting */ set_ps_display_suffix("waiting"); @@ -1990,6 +1998,8 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner) /* reset ps display to remove the suffix */ set_ps_display_remove_suffix(); + error_context_stack = waiterrcontext.previous; + TRACE_POSTGRESQL_LOCK_WAIT_DONE(locallock->tag.lock.locktag_field1, locallock->tag.lock.locktag_field2, locallock->tag.lock.locktag_field3, @@ -2000,6 +2010,28 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner) return result; } +/* + * error context callback for failures in WaitOnLock + * + * We report which lock was being waited on, in the same style used in + * deadlock reports. This helps with lock timeout errors in particular. + */ +static void +waitonlock_error_callback(void *arg) +{ + LOCALLOCK *locallock = (LOCALLOCK *) arg; + const LOCKTAG *tag = &locallock->tag.lock; + LOCKMODE mode = locallock->tag.mode; + StringInfoData locktagbuf; + + initStringInfo(&locktagbuf); + DescribeLockTag(&locktagbuf, tag); + + errcontext("waiting for %s on %s", + GetLockmodeName(tag->locktag_lockmethodid, mode), + locktagbuf.data); +} + /* * Remove a proc from the wait-queue it is on (caller must know it is on one). * This is only used when the proc has failed to get the lock, so we set its
pgsql-hackers by date: