It's now possible to fix this by putting a lock wait on the actual lock request, which wasn't available when I first wrote that, hence the crappy wait loop. Using the timeout handler would now be the preferred way to solve this. We can backpatch that to 9.3 if needed, where they were introduced.
There's an example of how to use lock waits further down on ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing it that way?
It looks like this will take some major surgery. The heavy weight lock manager doesn't play well with others when it comes to timeouts other than its own. LockBufferForCleanup is a simple retry loop, but the lock manager is much more complicated than that.
Not sure I understand this objection. I can't see a reason that my proposal wouldn't work.
On further thought, neither do I. The attached patch inverts ResolveRecoveryConflictWithLock to be called back from the lmgr code so that is it like ResolveRecoveryConflictWithBufferPin code. It does not try to cancel the conflicting lock holders from the signal handler, rather it just loops an extra time and cancels the transactions on the next call.
It looks like the deadlock detection is adequately handled within normal lmgr code within the back-ends of the other parties to the deadlock, so I didn't do a timeout for deadlock detection purposes.