> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > In my understanding the deadlock check is performed every time the
> > backend aquires lock. Once the it aquires, it kill the timer. However,
> > under heavy transactions such as pgbench generates, chances are that
> > the checking fires, and it tries to aquire a spin lock. That seems the
> > situation.
>
> It could be that with ~1000 backends all waiting for the same lock, the
> deadlock-checking code just plain takes too long to run. It might have
> an O(N^2) or worse behavior in the length of the queue; I don't think
> the code was ever analyzed for such problems.
>
> Do you want to try adding some instrumentation to HandleDeadlock to see
> how long it runs on each call?
I added some codes into HandleDeadLock to measure how long
LockLockTable and DeadLOckCheck calls take. Followings are the result
in running pgbench -c 1000 (it failed with stuck spin lock
error). "real time" shows how long they actually run (using
gettimeofday). "user time" and "system time" are measured by calling
getrusage. The time unit is milli second.
LockLockTable: real time
min | max | avg
-----+--------+------------------- 0 | 867873 | 152874.9015151515
LockLockTable: user time
min | max | avg
-----+-----+-------------- 0 | 30 | 1.2121212121
LockLockTable: system time
min | max | avg
-----+------+---------------- 0 | 2140 | 366.5909090909
DeadLockCheck: real time
min | max | avg
-----+-------+----------------- 0 | 87671 | 3463.6996197719
DeadLockCheck: user time
min | max | avg
-----+-----+--------------- 0 | 330 | 14.2205323194
DeadLockCheck: system time
min | max | avg
-----+-----+-------------- 0 | 100 | 2.5095057034