Improve heavyweight locks instead of building new lock managers? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Improve heavyweight locks instead of building new lock managers? |
Date | |
Msg-id | 20200211042229.msv23badgqljrdg2@alap3.anarazel.de Whole thread Raw |
Responses |
Re: Improve heavyweight locks instead of building new lock managers?
Re: Improve heavyweight locks instead of building new lock managers? |
List | pgsql-hackers |
Hi, I started this as a reply to https://www.postgresql.org/message-id/CAA4eK1JMgUfiTdAgr9k3nA4cdKvvruousKBg7FWTDNzQgBpOZA%40mail.gmail.com but the email seemed to morph into distinct topic that I thought it best to separate out. We've had a number of cases where heavyweight locks turned out to be too, well, heavy. And I don't mean cases where a fixed number of locks can do the job. The last case of this is the above thread, where a separate lock manager just for relation extension was implemented. My problem with that patch is that it just seems like the wrong direction architecturally to me. There's two main aspects to this: 1) It basically builds a another, more lightweight but less capable, of a lock manager that can lock more objects than we can have distinct locks for. It is faster because it uses *one* hashtable without conflict handling, because it has fewer lock modes, and because it doesn't support detecting deadlocks. And probably some other things. 2) A lot of the contention around file extension comes from us doing multiple expensive things under one lock (determining current relation size, searching victim buffer, extending file), and in tiny increments (growing a 1TB table by 8kb). This patch doesn't address that at all. I'm only planning to address 1) in this thread, and write a separate about 2) as part of the above thread. So more on that later. With regard to 1): To me this seems to go in the direction of having multiple bespoke lock managers with slightly different feature sets, different debugging / logging / monitoring support, with separate locking code each. That's bad for maintainability. I think one crucial piece of analysis that I've not seen fully done, is why a separate lock manager is actually faster. An early (quite different) version of this patch yielded the following micro-benchmark: https://www.postgresql.org/message-id/CAD21AoD1NenjTmD%3D5ypOBo9%3DFRtAtWVxUcHqHxY3wNos_5Bb5w%40mail.gmail.com On 2017-11-21 07:19:30 +0900, Masahiko Sawada wrote: > Also I've done a micro-benchmark; calling LockRelationForExtension and > UnlockRelationForExtension tightly in order to measure the number of > lock/unlock cycles per second. The result is, > PATCHED = 3.95892e+06 (cycles/sec) > HEAD = 1.15284e+06 (cycles/sec) > The patched is 3 times faster than current HEAD. but I've not actually seen an analysis as to *why* precisely there's a threefold difference in cost, and whether the problem could instead be addressed by making the difference much smaller. My guess would be that the difference, to a very large degree, comes from avoiding dynahash lookups. We currently have quite a few hashtable lookups: * LockRelationForExtension ** LockAcquireExtended does HASH_ENTERs in LockMethodLocalHash ** SetupLockInTable does HASH_ENTER_NULL in LockMethodLockHash ** SetupLockInTable does HASH_ENTER_NULL in LockMethodProcLockHash * UnlockRelationForExtension ** LockRelease does HASH_FIND in LockMethodLocalHash ** CleanUpLock does HASH_REMOVE in LockMethodProcLockHash ** CleanUpLock does HASH_REMOVE in LockMethodLockHash ** RemoveLocalLock does HASH_REMOVE in LockMethodLocalHash it's pretty easy to believe that a lock mapping like: + relextlock = &RelExtLockArray[RelExtLockTargetTagToIndex(&tag)]; with +typedef struct RelExtLockTag +{ + Oid dbid; /* InvalidOid for a shared relation */ + Oid relid; +} RelExtLockTag; ... +static inline uint32 +RelExtLockTargetTagToIndex(RelExtLockTag *locktag) +{ + return tag_hash(locktag, sizeof(RelExtLockTag)) % N_RELEXTLOCK_ENTS; +} and then just waiting till that specific hash entry becomes free, is cheaper than *7* dynahash lookups. Especially if doing so doesn't require any mapping locks. but it's not at all clear to me whether we cannot sometimes / always avoid some of this overhead. Improving lock.c would be a much bigger win, without building separate infrastructure. E.g. we could: - Have an extended LockAcquire API where the caller provides a stack variable for caching information until the LockRelease call, avoiding separate LockMethodLocalHash lookup for release. - Instead of always implementing HASH_REMOVE as a completely fresh lookup in dynahash, we should provide an API for removing an object we *know* is in the hashtable at a specific pointer. We're obviously already relying on that address to stay constant, so we'd not loose flexibility. With this separate API we can avoid the bucket lookup, walking through each element in that bucket, and having to compare the hash key every time (which is quite expensive for a full LOCKTAG). For the biggest benefit, we'd have to make the bucket list doubly linked, but even if we were to just go through from start, and just compare entries by pointer, we'd win big. - We should try to figure out whether we can avoid needing the separate lock table for PROCLOCKs. As far as I can tell there's not a single reason for it to be a hashtable, rather than something like NUM_LOCK_PARTITIONS lists of free PROCLOCKs. - Whenever we do a relation extension, we better already hold a normal relation lock. We don't actually need to have an entirely distinct type of lock for extensions, that theoretically could support N extension locks for each relation. The fact that these are associated could be utilized in different ways: - We could define relation extension as a separate lock level only conflicting with itself (and perhaps treat AELs as including it). That'd avoid needing separate shared memory states in many cases. This might be too problematic, because extension locks don't have the same desired behaviour around group locking (and a small army of other issues). - We could keep a separate extension lock cached inside the relation lock. The first time a transaction needs to extend, it does the normal work, and after that stores the PROCLOCK in the LOCALLOCK (or something like that). Once extension is done, don't release the lock entirely, but instead just reduce the lock level to a new non-conflicting lock level Alternatively we could implement something very similar outside of lock.c, e.g. by caching the LOCALLOCK* (possibly identified by an integer or such) in RelationData. That'd require enough support from lock.c to be able to make that really cheap. The second big difference between lock.c and the proposed relation extension lock is that it doesn't have a "mapping" lock. It does instead solve the the mapping without a lock by having exactly one potential location for each lock, using atomic ops to manipulate the lock state, and deals with conflicts by waiting for the bucket to become free. I don't think it's realistic to not use locks to protect the lock mapping hashtable, nor does it seem likely we can make the manipulation of individual lock states entirely atomic. But it very well seems possible to reduce the likelihood of contention: We could e.g. split the maintenance of the hashtable(s) from protecting the state of individual locks. The locks protecting the hashtable would just be held long enough to change a "pin count" of each lock, and then a per LOCK lwlock would protect each heavyweight lock's state. We'd not need to lock the partition locks for quite a few cases, because there's many locks in a loaded system that always have lockers. There'd be cost to that, needing more atomic ops in some cases, but I think it'd reduce contention tremendously, and it'd open a lot more optimization potential. It seems plausible that we could even work, as a followup, to not needing the partition locks for some lock releases (e.g. when there are other holders), and we might even be able to avoid it for acquisitions, by caching the LOCK inside LOCALLOCK, and re-identifying the identity. Thoughts? Greetings, Andres Freund
pgsql-hackers by date: