Reducing contention for the LockMgrLock - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Reducing contention for the LockMgrLock |
Date | |
Msg-id | 4037.1133992754@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Reducing contention for the LockMgrLock
Re: Reducing contention for the LockMgrLock Re: Reducing contention for the LockMgrLock Re: Reducing contention for the LockMgrLock |
List | pgsql-hackers |
We've suspected for awhile that once we'd fixed the buffer manager's use of a single global BufMgrLock, the next contention hotspot would be the lock manager's LockMgrLock. I've now seen actual evidence of that in profiling pgbench: using a modified backend that counts LWLock-related wait operations, the LockMgrLock is responsible for an order of magnitude more blockages than the next highest LWLock: PID 12971 lwlock LockMgrLock: shacq 0 exacq 50630 blk 3354 PID 12979 lwlock LockMgrLock: shacq 0 exacq 49706 blk 3323 PID 12976 lwlock LockMgrLock: shacq 0 exacq 50567 blk 3304 PID 12962 lwlock LockMgrLock: shacq 0 exacq 50635 blk 3278 PID 12974 lwlock LockMgrLock: shacq 0 exacq 50599 blk 3251 PID 12972 lwlock LockMgrLock: shacq 0 exacq 50204 blk 3243 PID 12973 lwlock LockMgrLock: shacq 0 exacq 50321 blk 3200 PID 12978 lwlock LockMgrLock: shacq 0 exacq 50266 blk 3177 PID 12977 lwlock LockMgrLock: shacq 0 exacq 50379 blk 3148 PID 12975 lwlock LockMgrLock: shacq 0 exacq 49790 blk 3124 PID 12971 lwlock WALInsertLock: shacq 0 exacq 24022 blk 408 PID 12972 lwlock WALInsertLock: shacq 0 exacq 24021 blk 393 PID 12976 lwlock WALInsertLock: shacq 0 exacq 24017 blk 390 PID 12977 lwlock WALInsertLock: shacq 0 exacq 24021 blk 388 PID 12973 lwlock WALInsertLock: shacq 0 exacq 24018 blk 379 PID 12962 lwlock WALInsertLock: shacq 0 exacq 24024 blk 377 PID 12974 lwlock WALInsertLock: shacq 0 exacq 24016 blk 367 PID 12975 lwlock WALInsertLock: shacq 0 exacq 24021 blk 366 PID 12978 lwlock WALInsertLock: shacq 0 exacq 24023 blk 354 PID 12979 lwlock WALInsertLock: shacq 0 exacq 24033 blk 321 PID 12973 lwlock ProcArrayLock: shacq 45214 exacq 6003 blk 241 PID 12971 lwlock ProcArrayLock: shacq 45355 exacq 6003 blk 225 (etc) We had also seen evidence to this effect from OSDL: http://archives.postgresql.org/pgsql-patches/2003-12/msg00365.php So it seems it's time to start thinking about how to reduce contention for the LockMgrLock. There are no interesting read-only operations on the shared lock table, so there doesn't seem to be any traction to be gained by making some operations take just shared access to the LockMgrLock. The best idea I've come up with after a bit of thought is to replace the shared lock table with N independent tables representing partitions of the lock space. Each lock would be assigned to one of these partitions based on, say, a hash of its LOCKTAG. I'm envisioning N of 16 or so to achieve (hopefully) about an order-of-magnitude reduction of contention. There would be a separate LWLock guarding each partition; the LWLock for a given partition would be considered to protect the LOCK objects assigned to that partition, all the PROCLOCK objects associated with each such LOCK, and the shared-memory hash tables holding these objects (each partition would need its own hash tables). A PGPROC's lock-related fields are only interesting when it is waiting for a lock, so we could say that the LWLock for the partition containing the lock it is waiting for must be held to examine/change these fields. The per-PGPROC list of all PROCLOCKs belonging to that PGPROC is a bit tricky to handle since it necessarily spans across partitions. We might be able to deal with this with suitable rules about when the list can be touched, but I've not worked this out in detail. Another possibility is to break this list apart into N lists, one per partition, but that would bloat the PGPROC array a bit, especially if we wanted larger N. The basic LockAcquire and LockRelease operations would only need to acquire the LWLock for the partition containing the lock they are interested in; this is what gives us the contention reduction. LockReleaseAll is also interesting from a performance point of view, since it executes at every transaction exit. If we divide PGPROC's PROCLOCK list into N lists then it will be very easy for LockReleaseAll to take only the partition locks it actually needs to release these locks; if not, we might have to resort to scanning the list N times, once while we hold the LWLock for each partition. I think that CheckDeadLock will probably require taking all the partition LWLocks (as long as it does this in a predetermined order there is no risk of deadlock on the partition LWLocks). But one hopes this is not a performance-critical operation. Ditto for GetLockStatusData. One objection I can see to this idea is that having N lock hash tables instead of one will eat a larger amount of shared memory in hashtable overhead. But the lock hashtables are fairly small relative to the shared buffer array (given typical configuration parameters) so this doesn't seem like a major problem. Another objection is that LockReleaseAll will get slower (since it will certainly call LWLockAcquire/Release more times) and in situations that aren't heavily concurrent there won't be any compensating gain. I think this won't be a significant effect, but there's probably no way to tell for sure without actually writing the code and testing it. While at it, I'm inclined to get rid of the current assumption that there are logically separate hash tables for different LockMethodIds. AFAICS all that's doing for us is creating a level of confusion; there's nothing on the horizon suggesting we'd ever actually make use of the flexibility. Thoughts, better ideas? regards, tom lane
pgsql-hackers by date: