Improve heavyweight locks instead of building new lock managers? - Mailing list pgsql-hackers

From Andres Freund
Subject Improve heavyweight locks instead of building new lock managers?
Date
Msg-id 20200211042229.msv23badgqljrdg2@alap3.anarazel.de
Whole thread Raw
Responses Re: Improve heavyweight locks instead of building new lock managers?
Re: Improve heavyweight locks instead of building new lock managers?
List pgsql-hackers
Hi,

I started this as a reply to
https://www.postgresql.org/message-id/CAA4eK1JMgUfiTdAgr9k3nA4cdKvvruousKBg7FWTDNzQgBpOZA%40mail.gmail.com
but the email seemed to morph into distinct topic that I thought it
best to separate out.

We've had a number of cases where heavyweight locks turned out to be
too, well, heavy. And I don't mean cases where a fixed number of locks
can do the job.  The last case of this is the above thread, where a
separate lock manager just for relation extension was implemented.

My problem with that patch is that it just seems like the wrong
direction architecturally to me. There's two main aspects to this:

1) It basically builds a another, more lightweight but less capable, of
   a lock manager that can lock more objects than we can have distinct
   locks for.  It is faster because it uses *one* hashtable without
   conflict handling, because it has fewer lock modes, and because it
   doesn't support detecting deadlocks. And probably some other things.

2) A lot of the contention around file extension comes from us doing
   multiple expensive things under one lock (determining current
   relation size, searching victim buffer, extending file), and in tiny
   increments (growing a 1TB table by 8kb). This patch doesn't address
   that at all.

I'm only planning to address 1) in this thread, and write a separate
about 2) as part of the above thread. So more on that later.

With regard to 1):

To me this seems to go in the direction of having multiple bespoke lock
managers with slightly different feature sets, different debugging /
logging / monitoring support, with separate locking code each.  That's
bad for maintainability.


I think one crucial piece of analysis that I've not seen fully done, is
why a separate lock manager is actually faster. An early (quite
different) version of this patch yielded the following micro-benchmark:

https://www.postgresql.org/message-id/CAD21AoD1NenjTmD%3D5ypOBo9%3DFRtAtWVxUcHqHxY3wNos_5Bb5w%40mail.gmail.com
On 2017-11-21 07:19:30 +0900, Masahiko Sawada wrote:
> Also I've done a micro-benchmark; calling LockRelationForExtension and
> UnlockRelationForExtension tightly in order to measure the number of
> lock/unlock cycles per second. The result is,
> PATCHED = 3.95892e+06 (cycles/sec)
> HEAD = 1.15284e+06 (cycles/sec)
> The patched is 3 times faster than current HEAD.

but I've not actually seen an analysis as to *why* precisely there's a
threefold difference in cost, and whether the problem could instead be
addressed by making the difference much smaller.

My guess would be that the difference, to a very large degree, comes from
avoiding dynahash lookups. We currently have quite a few hashtable
lookups:

* LockRelationForExtension
** LockAcquireExtended does HASH_ENTERs in LockMethodLocalHash
** SetupLockInTable does HASH_ENTER_NULL in LockMethodLockHash
** SetupLockInTable does HASH_ENTER_NULL in LockMethodProcLockHash
* UnlockRelationForExtension
** LockRelease does HASH_FIND in LockMethodLocalHash
** CleanUpLock does HASH_REMOVE in LockMethodProcLockHash
** CleanUpLock does HASH_REMOVE in LockMethodLockHash
** RemoveLocalLock does HASH_REMOVE in LockMethodLocalHash

it's pretty easy to believe that a lock mapping like:

+        relextlock = &RelExtLockArray[RelExtLockTargetTagToIndex(&tag)];

with

+typedef struct RelExtLockTag
+{
+    Oid            dbid;            /* InvalidOid for a shared relation */
+    Oid            relid;
+} RelExtLockTag;
...
+static inline uint32
+RelExtLockTargetTagToIndex(RelExtLockTag *locktag)
+{
+    return tag_hash(locktag, sizeof(RelExtLockTag)) % N_RELEXTLOCK_ENTS;
+}

and then just waiting till that specific hash entry becomes free, is
cheaper than *7* dynahash lookups. Especially if doing so doesn't
require any mapping locks.

but it's not at all clear to me whether we cannot sometimes / always
avoid some of this overhead. Improving lock.c would be a much bigger
win, without building separate infrastructure.

E.g. we could:

- Have an extended LockAcquire API where the caller provides a stack
  variable for caching information until the LockRelease call, avoiding
  separate LockMethodLocalHash lookup for release.

- Instead of always implementing HASH_REMOVE as a completely fresh
  lookup in dynahash, we should provide an API for removing an object we
  *know* is in the hashtable at a specific pointer. We're obviously
  already relying on that address to stay constant, so we'd not loose
  flexibility.  With this separate API we can avoid the bucket lookup,
  walking through each element in that bucket, and having to compare the
  hash key every time (which is quite expensive for a full LOCKTAG).

  For the biggest benefit, we'd have to make the bucket list doubly
  linked, but even if we were to just go through from start, and just
  compare entries by pointer, we'd win big.

- We should try to figure out whether we can avoid needing the separate
  lock table for PROCLOCKs. As far as I can tell there's not a single
  reason for it to be a hashtable, rather than something like
  NUM_LOCK_PARTITIONS lists of free PROCLOCKs.

- Whenever we do a relation extension, we better already hold a normal
  relation lock. We don't actually need to have an entirely distinct
  type of lock for extensions, that theoretically could support N
  extension locks for each relation.  The fact that these are associated
  could be utilized in different ways:

  - We could define relation extension as a separate lock level only
    conflicting with itself (and perhaps treat AELs as including
    it). That'd avoid needing separate shared memory states in many
    cases.

    This might be too problematic, because extension locks don't have
    the same desired behaviour around group locking (and a small army of
    other issues).

  - We could keep a separate extension lock cached inside the relation
    lock.  The first time a transaction needs to extend, it does the
    normal work, and after that stores the PROCLOCK in the LOCALLOCK (or
    something like that). Once extension is done, don't release the lock
    entirely, but instead just reduce the lock level to a new
    non-conflicting lock level

    Alternatively we could implement something very similar outside of
    lock.c, e.g. by caching the LOCALLOCK* (possibly identified by an
    integer or such) in RelationData.  That'd require enough support
    from lock.c to be able to make that really cheap.


The second big difference between lock.c and the proposed relation
extension lock is that it doesn't have a "mapping" lock. It does instead
solve the the mapping without a lock by having exactly one potential
location for each lock, using atomic ops to manipulate the lock state,
and deals with conflicts by waiting for the bucket to become free.

I don't think it's realistic to not use locks to protect the lock
mapping hashtable, nor does it seem likely we can make the manipulation
of individual lock states entirely atomic.  But it very well seems
possible to reduce the likelihood of contention:

We could e.g. split the maintenance of the hashtable(s) from protecting
the state of individual locks. The locks protecting the hashtable would
just be held long enough to change a "pin count" of each lock, and then
a per LOCK lwlock would protect each heavyweight lock's state.  We'd not
need to lock the partition locks for quite a few cases, because there's
many locks in a loaded system that always have lockers.  There'd be cost
to that, needing more atomic ops in some cases, but I think it'd reduce
contention tremendously, and it'd open a lot more optimization
potential.  It seems plausible that we could even work, as a followup,
to not needing the partition locks for some lock releases (e.g. when
there are other holders), and we might even be able to avoid it for
acquisitions, by caching the LOCK inside LOCALLOCK, and re-identifying
the identity.

Thoughts?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: amul sul
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Next
From: Thomas Munro
Date:
Subject: Re: Getting rid of some more lseek() calls