Let's invent a function to report lock-wait-blocking PIDs - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Let's invent a function to report lock-wait-blocking PIDs |
Date | |
Msg-id | 7419.1363802552@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Let's invent a function to report lock-wait-blocking PIDs
Re: Let's invent a function to report lock-wait-blocking PIDs Re: Let's invent a function to report lock-wait-blocking PIDs Re: Let's invent a function to report lock-wait-blocking PIDs Re: Let's invent a function to report lock-wait-blocking PIDs |
List | pgsql-hackers |
I was just looking into why the -DCLOBBER_CACHE_ALWAYS buildfarm critters aren't managing to run the new "timeouts" isolation test successfully, despite very generous timeouts. The answer is that 2 seconds isn't quite enough time to parse+plan+execute the query that isolationtester uses to see if the current test session is blocked on a lock, if CLOBBER_CACHE_ALWAYS is on. Now, that query is totally horrible: appendPQExpBufferStr(&wait_query, "SELECT 1 FROM pg_locks holder, pg_locks waiter " "WHERE NOT waiter.granted AND waiter.pid = $1 " "AND holder.granted " "AND holder.pid <> $1 AND holder.pid IN ("); /* The spec syntax requires at least one session; assume that here.*/ appendPQExpBuffer(&wait_query, "%s", backend_pids[1]); for (i = 2; i < nconns; i++) appendPQExpBuffer(&wait_query,", %s", backend_pids[i]); appendPQExpBufferStr(&wait_query, ") " "AND holder.mode = ANY (CASE waiter.mode " "WHEN 'AccessShareLock' THEN ARRAY[" "'AccessExclusiveLock'] " "WHEN 'RowShareLock' THEN ARRAY[" "'ExclusiveLock'," "'AccessExclusiveLock'] " "WHEN 'RowExclusiveLock'THEN ARRAY[" "'ShareLock'," "'ShareRowExclusiveLock'," "'ExclusiveLock'," "'AccessExclusiveLock'] " "WHEN 'ShareUpdateExclusiveLock'THEN ARRAY[" "'ShareUpdateExclusiveLock'," "'ShareLock'," "'ShareRowExclusiveLock'," "'ExclusiveLock'," "'AccessExclusiveLock'] " "WHEN 'ShareLock' THEN ARRAY[" "'RowExclusiveLock'," "'ShareUpdateExclusiveLock'," "'ShareRowExclusiveLock'," "'ExclusiveLock'," "'AccessExclusiveLock'] " "WHEN 'ShareRowExclusiveLock' THEN ARRAY[" "'RowExclusiveLock'," "'ShareUpdateExclusiveLock'," "'ShareLock'," "'ShareRowExclusiveLock'," "'ExclusiveLock'," "'AccessExclusiveLock'] " "WHEN 'ExclusiveLock' THEN ARRAY[" "'RowShareLock'," "'RowExclusiveLock'," "'ShareUpdateExclusiveLock'," "'ShareLock'," "'ShareRowExclusiveLock'," "'ExclusiveLock'," "'AccessExclusiveLock']" "WHEN 'AccessExclusiveLock' THEN ARRAY[" "'AccessShareLock'," "'RowShareLock'," "'RowExclusiveLock'," "'ShareUpdateExclusiveLock'," "'ShareLock'," "'ShareRowExclusiveLock'," "'ExclusiveLock'," "'AccessExclusiveLock'] END) " "AND holder.locktype IS NOT DISTINCT FROM waiter.locktype " "AND holder.database IS NOTDISTINCT FROM waiter.database " "AND holder.relation IS NOT DISTINCT FROM waiter.relation " "AND holder.page IS NOT DISTINCT FROM waiter.page " "AND holder.tuple IS NOT DISTINCTFROM waiter.tuple " "AND holder.virtualxid IS NOT DISTINCT FROM waiter.virtualxid " "AND holder.transactionidIS NOT DISTINCT FROM waiter.transactionid " "AND holder.classid IS NOT DISTINCT FROMwaiter.classid " "AND holder.objid IS NOT DISTINCT FROM waiter.objid " "AND holder.objsubidIS NOT DISTINCT FROM waiter.objsubid "); This is way more knowledge than we (should) want a client to embed about which lock types block which others. What's worse, it's still wrong. The query will find cases where one of the test sessions *directly* blocks another one, but not cases where the blockage is indirect. For example, consider that A holds AccessShareLock, B is waiting for AccessExclusiveLock on the same object, and C is queued up behind B for another AccessShareLock. This query will not think that C is blocked, not even if B is part of the set of sessions of interest (because B will show the lock as not granted); but especially so if B is not part of the set. I think that such situations may not arise in the specific context that isolationtester says it's worried about, which is to disregard waits for locks held by autovacuum. But in general, you can't reliably tell who's blocking whom with a query like this. If isolationtester were the only market for this type of information, maybe it wouldn't be worth worrying about. But I'm pretty sure that there are a *lot* of monitoring applications out there that are trying to extract who-blocks-whom information from pg_locks. I hadn't realized before quite how painful it is to do that, even incorrectly. I propose that we should add a backend function that simplifies this type of query. The API that comes to mind is (name subject to bikeshedding) pg_blocking_pids(pid int) returns int[] defined to return NULL if the argument isn't the PID of any backend or that backend isn't waiting for a lock, and otherwise an array of the PIDs of the backends that are blocking it from getting the lock. I would compute the array as PIDs of backends already holding conflicting locks,plus PIDs of backends requesting conflicting locks that areahead of thisone in the lock's wait queue,plus PIDs of backends that block the latter group of PIDs(ie, are holding locks conflictingwith their requests,or are awaiting such locks and are ahead of them in the queue) There would be some cases where this definition would be too expansive, ie we'd release the waiter after only some of the listed sessions had released their lock or request. (That could happen for instance if we concluded we had to move up the waiter's request to escape a deadlock.) But I think that it's better to err in that direction than to underestimate the set of relevant PIDs. In the isolationtester use-case, we'd get the right answer by testing whether this function's result has any overlap with the set of PIDs of test sessions, ie select pg_blocking_pids($1) && array[pid1, pid2, pid3, ...] Thoughts? regards, tom lane
pgsql-hackers by date: