Thread: Improve heavyweight locks instead of building new lock managers?

Improve heavyweight locks instead of building new lock managers?

From
Andres Freund
Date:
Hi,

I started this as a reply to
https://www.postgresql.org/message-id/CAA4eK1JMgUfiTdAgr9k3nA4cdKvvruousKBg7FWTDNzQgBpOZA%40mail.gmail.com
but the email seemed to morph into distinct topic that I thought it
best to separate out.

We've had a number of cases where heavyweight locks turned out to be
too, well, heavy. And I don't mean cases where a fixed number of locks
can do the job.  The last case of this is the above thread, where a
separate lock manager just for relation extension was implemented.

My problem with that patch is that it just seems like the wrong
direction architecturally to me. There's two main aspects to this:

1) It basically builds a another, more lightweight but less capable, of
   a lock manager that can lock more objects than we can have distinct
   locks for.  It is faster because it uses *one* hashtable without
   conflict handling, because it has fewer lock modes, and because it
   doesn't support detecting deadlocks. And probably some other things.

2) A lot of the contention around file extension comes from us doing
   multiple expensive things under one lock (determining current
   relation size, searching victim buffer, extending file), and in tiny
   increments (growing a 1TB table by 8kb). This patch doesn't address
   that at all.

I'm only planning to address 1) in this thread, and write a separate
about 2) as part of the above thread. So more on that later.

With regard to 1):

To me this seems to go in the direction of having multiple bespoke lock
managers with slightly different feature sets, different debugging /
logging / monitoring support, with separate locking code each.  That's
bad for maintainability.


I think one crucial piece of analysis that I've not seen fully done, is
why a separate lock manager is actually faster. An early (quite
different) version of this patch yielded the following micro-benchmark:

https://www.postgresql.org/message-id/CAD21AoD1NenjTmD%3D5ypOBo9%3DFRtAtWVxUcHqHxY3wNos_5Bb5w%40mail.gmail.com
On 2017-11-21 07:19:30 +0900, Masahiko Sawada wrote:
> Also I've done a micro-benchmark; calling LockRelationForExtension and
> UnlockRelationForExtension tightly in order to measure the number of
> lock/unlock cycles per second. The result is,
> PATCHED = 3.95892e+06 (cycles/sec)
> HEAD = 1.15284e+06 (cycles/sec)
> The patched is 3 times faster than current HEAD.

but I've not actually seen an analysis as to *why* precisely there's a
threefold difference in cost, and whether the problem could instead be
addressed by making the difference much smaller.

My guess would be that the difference, to a very large degree, comes from
avoiding dynahash lookups. We currently have quite a few hashtable
lookups:

* LockRelationForExtension
** LockAcquireExtended does HASH_ENTERs in LockMethodLocalHash
** SetupLockInTable does HASH_ENTER_NULL in LockMethodLockHash
** SetupLockInTable does HASH_ENTER_NULL in LockMethodProcLockHash
* UnlockRelationForExtension
** LockRelease does HASH_FIND in LockMethodLocalHash
** CleanUpLock does HASH_REMOVE in LockMethodProcLockHash
** CleanUpLock does HASH_REMOVE in LockMethodLockHash
** RemoveLocalLock does HASH_REMOVE in LockMethodLocalHash

it's pretty easy to believe that a lock mapping like:

+        relextlock = &RelExtLockArray[RelExtLockTargetTagToIndex(&tag)];

with

+typedef struct RelExtLockTag
+{
+    Oid            dbid;            /* InvalidOid for a shared relation */
+    Oid            relid;
+} RelExtLockTag;
...
+static inline uint32
+RelExtLockTargetTagToIndex(RelExtLockTag *locktag)
+{
+    return tag_hash(locktag, sizeof(RelExtLockTag)) % N_RELEXTLOCK_ENTS;
+}

and then just waiting till that specific hash entry becomes free, is
cheaper than *7* dynahash lookups. Especially if doing so doesn't
require any mapping locks.

but it's not at all clear to me whether we cannot sometimes / always
avoid some of this overhead. Improving lock.c would be a much bigger
win, without building separate infrastructure.

E.g. we could:

- Have an extended LockAcquire API where the caller provides a stack
  variable for caching information until the LockRelease call, avoiding
  separate LockMethodLocalHash lookup for release.

- Instead of always implementing HASH_REMOVE as a completely fresh
  lookup in dynahash, we should provide an API for removing an object we
  *know* is in the hashtable at a specific pointer. We're obviously
  already relying on that address to stay constant, so we'd not loose
  flexibility.  With this separate API we can avoid the bucket lookup,
  walking through each element in that bucket, and having to compare the
  hash key every time (which is quite expensive for a full LOCKTAG).

  For the biggest benefit, we'd have to make the bucket list doubly
  linked, but even if we were to just go through from start, and just
  compare entries by pointer, we'd win big.

- We should try to figure out whether we can avoid needing the separate
  lock table for PROCLOCKs. As far as I can tell there's not a single
  reason for it to be a hashtable, rather than something like
  NUM_LOCK_PARTITIONS lists of free PROCLOCKs.

- Whenever we do a relation extension, we better already hold a normal
  relation lock. We don't actually need to have an entirely distinct
  type of lock for extensions, that theoretically could support N
  extension locks for each relation.  The fact that these are associated
  could be utilized in different ways:

  - We could define relation extension as a separate lock level only
    conflicting with itself (and perhaps treat AELs as including
    it). That'd avoid needing separate shared memory states in many
    cases.

    This might be too problematic, because extension locks don't have
    the same desired behaviour around group locking (and a small army of
    other issues).

  - We could keep a separate extension lock cached inside the relation
    lock.  The first time a transaction needs to extend, it does the
    normal work, and after that stores the PROCLOCK in the LOCALLOCK (or
    something like that). Once extension is done, don't release the lock
    entirely, but instead just reduce the lock level to a new
    non-conflicting lock level

    Alternatively we could implement something very similar outside of
    lock.c, e.g. by caching the LOCALLOCK* (possibly identified by an
    integer or such) in RelationData.  That'd require enough support
    from lock.c to be able to make that really cheap.


The second big difference between lock.c and the proposed relation
extension lock is that it doesn't have a "mapping" lock. It does instead
solve the the mapping without a lock by having exactly one potential
location for each lock, using atomic ops to manipulate the lock state,
and deals with conflicts by waiting for the bucket to become free.

I don't think it's realistic to not use locks to protect the lock
mapping hashtable, nor does it seem likely we can make the manipulation
of individual lock states entirely atomic.  But it very well seems
possible to reduce the likelihood of contention:

We could e.g. split the maintenance of the hashtable(s) from protecting
the state of individual locks. The locks protecting the hashtable would
just be held long enough to change a "pin count" of each lock, and then
a per LOCK lwlock would protect each heavyweight lock's state.  We'd not
need to lock the partition locks for quite a few cases, because there's
many locks in a loaded system that always have lockers.  There'd be cost
to that, needing more atomic ops in some cases, but I think it'd reduce
contention tremendously, and it'd open a lot more optimization
potential.  It seems plausible that we could even work, as a followup,
to not needing the partition locks for some lock releases (e.g. when
there are other holders), and we might even be able to avoid it for
acquisitions, by caching the LOCK inside LOCALLOCK, and re-identifying
the identity.

Thoughts?

Greetings,

Andres Freund



Re: Improve heavyweight locks instead of building new lock managers?

From
Andres Freund
Date:
Hi,

Some of the discussions about improving the locking code, in particular
the group locking / deadlock detector discussion with Robert, again made
me look at lock.c.  While looking at how much work it'd be to a) remove
the PROCLOCK hashtable b) move more of the group locking logic into
lock.c, rather than smarts in deadlock.c, I got sidetracked by all the
verbose and hard to read SHM_QUEUE code.

Here's a *draft* patch series for removing all use of SHM_QUEUE, and
subsequently removing SHM_QUEUE. There's a fair bit of polish needed,
but I do think it makes the code considerably easier to read
(particularly for predicate.c). The diffstat is nice too:

 src/include/lib/ilist.h                     | 132 +++++++++++++++++----
 src/include/replication/walsender_private.h |   3 +-
 src/include/storage/lock.h                  |  10 +-
 src/include/storage/predicate_internals.h   |  49 +++-----
 src/include/storage/proc.h                  |  18 +--
 src/include/storage/shmem.h                 |  22 ----
 src/backend/access/transam/twophase.c       |   4 +-
 src/backend/lib/ilist.c                     |   8 +-
 src/backend/replication/syncrep.c           |  89 ++++++--------
 src/backend/replication/walsender.c         |   2 +-
 src/backend/storage/ipc/Makefile            |   1 -
 src/backend/storage/ipc/shmqueue.c          | 190 ------------------------------
 src/backend/storage/lmgr/deadlock.c         |  76 +++++-------
 src/backend/storage/lmgr/lock.c             | 129 ++++++++------------
 src/backend/storage/lmgr/predicate.c        | 692
+++++++++++++++++++++++++++++++++++------------------------------------------------------------------------
 src/backend/storage/lmgr/proc.c             | 197 +++++++++++++------------------
 16 files changed, 569 insertions(+), 1053 deletions(-)

I don't want to invest a lot of time into this if there's not some
agreement that this is a good direction to go into. So I'd appreciate a
few cursory looks before spending more time.

Overview:
0001: Add additional prev/next & detached node functions to ilist.
  I think the additional prev/next accessors are nice. I am less
  convinced by the 'detached' stuff, but it's used by some SHM_QUEUE
  users. I don't want to make the plain dlist_delete reset the node's
  prev/next pointers, it's not needed in the vast majority of cases...

0002: Use dlists instead of SHM_QUEUE for heavyweight locks.
  I mostly removed the odd reliance on PGPROC.links needing to be the
  first struct member - seems odd.

  I think we should rename PROC_QUEUE.links, elsewhere that's used for
  list membership nodes, so it's imo confusing/odd.

0003: Use dlist for syncrep queue.
  This seems fairly simple to me.

0004: Use dlists for predicate locking.
  Unfortunately pretty large. I think it's a huge improvement, but it's
  also subtle code. Wonder if there's something better to do here wrt
  OnConflict_CheckForSerializationFailure?

0005: Remove now unused SHMQueue*.
0006: Remove PROC_QUEUE.size.
  I'm not sure whether this is a a good idea. I was looking primarily at
  that because I thought it'd allow us to remove PROC_QUEUE as a whole
  if we wanted to. But as PROC_QUEUE.size doesn't really seem to buy us
  much, we should perhaps just do something roughly like in the patch?

Greetings,

Andres Freund

Attachment

Re: Improve heavyweight locks instead of building new lock managers?

From
Amit Langote
Date:
Hi Andres,

On Thu, Feb 20, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote:
> Here's a *draft* patch series for removing all use of SHM_QUEUE, and
> subsequently removing SHM_QUEUE. There's a fair bit of polish needed,
> but I do think it makes the code considerably easier to read
> (particularly for predicate.c). The diffstat is nice too:
>
> 0005: Remove now unused SHMQueue*.
> 0006: Remove PROC_QUEUE.size.

Maybe you're aware, but there still seem to be places using it.  In
LOCK_DEBUG build:

lock.c: In function ‘LOCK_PRINT’:
lock.c:320:20: error: ‘PROC_QUEUE’ {aka ‘const struct PROC_QUEUE’} has
no member named ‘size’
     lock->waitProcs.size,

lock.c: In function ‘DumpLocks’:
lock.c:3906:2: error: unknown type name ‘SHM_QUEUE’; did you mean ‘SI_QUEUE’?

Fwiw, I for one, am all for removing specialized data structures when
more widely used data structures will do, especially if that
specialization is no longer relevant.

Thanks,
Amit



Re: Improve heavyweight locks instead of building new lock managers?

From
Thomas Munro
Date:
On Thu, Feb 20, 2020 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:
>  16 files changed, 569 insertions(+), 1053 deletions(-)

Nice!

Some comments on 0001, 0003, 0004:

> Subject: [PATCH v1 1/6] Add additional prev/next & detached node functions to

+extern void dlist_check(const dlist_head *head);
+extern void slist_check(const slist_head *head);

I approve of the incidental constification in this patch.

+/*
+ * Like dlist_delete(), but also sets next/prev to NULL to signal not being in
+ * list.
+ */
+static inline void
+dlist_delete_thoroughly(dlist_node *node)
+{
+       node->prev->next = node->next;
+       node->next->prev = node->prev;
+       node->next = NULL;
+       node->prev = NULL;
+}

Instead of introducing this strange terminology, why not just have the
callers do ...

dlist_delete(node);
dlist_node_init(node);

..., or perhaps supply dlist_delete_and_reinit(node) that does exactly
that?  That is, reuse the code and terminology.

+/*
+ * Check if node is detached. A node is only detached if it either has been
+ * initialized with dlist_init_node(), or deleted with
+ * dlist_delete_thoroughly().
+ */
+static inline bool
+dlist_node_is_detached(const dlist_node *node)
+{
+    Assert((node->next == NULL && node->prev == NULL) ||
+           (node->next != NULL && node->prev != NULL));
+
+    return node->next == NULL;
+}

How about dlist_node_is_linked()?  I don't like introducing random new
verbs when we already have 'linked' in various places, and these
things are, y'know, linked lists.

> Subject: [PATCH v1 3/6] Use dlist for syncrep queue.

LGTM.

> Subject: [PATCH v1 4/6] Use dlists for predicate locking.

+    dlist_foreach(iter, &unconstify(SERIALIZABLEXACT *, reader)->outConflicts)

Yuck...  I suppose you could do this:

-    dlist_foreach(iter, &unconstify(SERIALIZABLEXACT *, reader)->outConflicts)
+    dlist_foreach_const(iter, &reader->outConflicts)

... given:

+/* Variant for when you have a pointer to const dlist_head. */
+#define dlist_foreach_const(iter, lhead)                    \
+    for (AssertVariableIsOfTypeMacro(iter, dlist_iter),            \
+         AssertVariableIsOfTypeMacro(lhead, const dlist_head *),    \
+         (iter).end = (dlist_node *) &(lhead)->head,            \
+         (iter).cur = (iter).end->next ? (iter).end->next : (iter).end;    \
+         (iter).cur != (iter).end;                    \
+         (iter).cur = (iter).cur->next)
+

... or find a way to make dlist_foreach() handle that itself, which
seems pretty reasonable given its remit to traverse lists without
modifying them, though perhaps that would require a different iterator
type...

Otherwise looks OK to me and passes various tests I threw at it.



Re: Improve heavyweight locks instead of building new lock managers?

From
Andres Freund
Date:
Hi,

On 2020-02-21 12:40:06 +1300, Thomas Munro wrote:
> On Thu, Feb 20, 2020 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:
> >  16 files changed, 569 insertions(+), 1053 deletions(-)
> 
> Nice!

Thanks!


> Some comments on 0001, 0003, 0004:
> 
> > Subject: [PATCH v1 1/6] Add additional prev/next & detached node functions to
> 
> +extern void dlist_check(const dlist_head *head);
> +extern void slist_check(const slist_head *head);
> 
> I approve of the incidental constification in this patch.

It was just necessary fallout :)


> +/*
> + * Like dlist_delete(), but also sets next/prev to NULL to signal not being in
> + * list.
> + */
> +static inline void
> +dlist_delete_thoroughly(dlist_node *node)
> +{
> +       node->prev->next = node->next;
> +       node->next->prev = node->prev;
> +       node->next = NULL;
> +       node->prev = NULL;
> +}
> 
> Instead of introducing this strange terminology, why not just have the
> callers do ...
> 
> dlist_delete(node);
> dlist_node_init(node);

There's quite a few callers in predicate.c - I actually did that first.


> ..., or perhaps supply dlist_delete_and_reinit(node) that does exactly
> that?  That is, reuse the code and terminology.

Yea, that's might be better, but see paragraph below.  I quite dislike
adding any "empty node" state.


> +/*
> + * Check if node is detached. A node is only detached if it either has been
> + * initialized with dlist_init_node(), or deleted with
> + * dlist_delete_thoroughly().
> + */
> +static inline bool
> +dlist_node_is_detached(const dlist_node *node)
> +{
> +    Assert((node->next == NULL && node->prev == NULL) ||
> +           (node->next != NULL && node->prev != NULL));
> +
> +    return node->next == NULL;
> +}
> 
> How about dlist_node_is_linked()?  I don't like introducing random new
> verbs when we already have 'linked' in various places, and these
> things are, y'know, linked lists.

Well, but that doesn't signal that you can't just delete and have
dlist_node_is_linked() work. I *want* it to sound "different".  We could
of course make delete always do this, but I don't want to add that
overhead unnecessarily.


> > Subject: [PATCH v1 4/6] Use dlists for predicate locking.
> 
> +    dlist_foreach(iter, &unconstify(SERIALIZABLEXACT *, reader)->outConflicts)
> 
> Yuck...

It doesn't seem *that* bad to me, at least signals properly what we're
doing, and only does so in one place.


> I suppose you could do this:
> 
> -    dlist_foreach(iter, &unconstify(SERIALIZABLEXACT *, reader)->outConflicts)
> +    dlist_foreach_const(iter, &reader->outConflicts)

We'd need a different iterator type too, I think? Because the iterator
itself can't be constant, but we'd want the elements themselves be
pointers to constants.

const just isn't a very granular thing in C :(.


> ... given:
> 
> +/* Variant for when you have a pointer to const dlist_head. */
> +#define dlist_foreach_const(iter, lhead)                    \
> +    for (AssertVariableIsOfTypeMacro(iter, dlist_iter),            \
> +         AssertVariableIsOfTypeMacro(lhead, const dlist_head *),    \
> +         (iter).end = (dlist_node *) &(lhead)->head,            \
> +         (iter).cur = (iter).end->next ? (iter).end->next : (iter).end;    \
> +         (iter).cur != (iter).end;                    \
> +         (iter).cur = (iter).cur->next)
> +
> 
> ... or find a way to make dlist_foreach() handle that itself, which
> seems pretty reasonable given its remit to traverse lists without
> modifying them, though perhaps that would require a different iterator
> type...

I was trying that first, but I don't easily see how we can do
so. Iterating over a non-constant list with dlist_foreach obviously
still allows to to manipulate the list members. Thus dlist_iter.cur
can't be a 'pointer to const'.  Whereas that's arguably what'd be needed
for a correct dlist_foreach() of a constant list?

We could just accept const pointers for dlist_foreach(), but then we'd
*always* accept them, and we'd thus unconditionally have iter.cur as
non-const.  Would that be better?


> Otherwise looks OK to me and passes various tests I threw at it

Thanks!

Greetings,

Andres Freund



Re: Improve heavyweight locks instead of building new lock managers?

From
Robert Haas
Date:
On Mon, Feb 10, 2020 at 11:22 PM Andres Freund <andres@anarazel.de> wrote:
> To me this seems to go in the direction of having multiple bespoke lock
> managers with slightly different feature sets, different debugging /
> logging / monitoring support, with separate locking code each.  That's
> bad for maintainability.

I think it would help a lot if the lock manager were not such a
monolithic thing. For instance, suppose that instead of having the
deadlock detector be part of the lock manager, it's a separate thing
that integrates with the lock manager. Instead of only knowing about
waits for heavyweight locks, it knows about whatever you want to tell
it about. Then, for instance, you could possibly tell it that process
A is waiting for a cleanup lock while process B holds a pin, possibly
detecting a deadlock we can't notice today. There are likely other
cases as well.

> E.g. we could:
>
> - Have an extended LockAcquire API where the caller provides a stack
>   variable for caching information until the LockRelease call, avoiding
>   separate LockMethodLocalHash lookup for release.

Maybe. Seems like it might make things a little clunky for the caller,
though. If you just kept a stack of locks acquired and checked the
lock being released against the top of the stack, you'd probably catch
a large percentage of cases. Or, the caller could pass a flag
indicating whether they intend to release the lock prior to the end of
the transaction (which could be cross-checked by assertions). Any that
you intend to release early go on a stack.

> - Instead of always implementing HASH_REMOVE as a completely fresh
>   lookup in dynahash, we should provide an API for removing an object we
>   *know* is in the hashtable at a specific pointer. We're obviously
>   already relying on that address to stay constant, so we'd not loose
>   flexibility.  With this separate API we can avoid the bucket lookup,
>   walking through each element in that bucket, and having to compare the
>   hash key every time (which is quite expensive for a full LOCKTAG).

That makes a lot of sense, although I wonder if we should go further
and replace dynahash entirely.

> - We should try to figure out whether we can avoid needing the separate
>   lock table for PROCLOCKs. As far as I can tell there's not a single
>   reason for it to be a hashtable, rather than something like
>   NUM_LOCK_PARTITIONS lists of free PROCLOCKs.

I agree that the PROCLOCK thing seems ugly and inefficient. I'm not
sure that a bunch of lists is the best answer, though it might be. One
case to consider is when a lock is initially acquired via the
fast-path and then, because of a conflicting lock acquisition,
transferred by *some other process* to the main lock table. If this
occurs, the original locker must be able to find its PROCLOCK. It
doesn't have to be crazy efficient because it shouldn't happen very
often, but it shouldn't suck too much.

> - Whenever we do a relation extension, we better already hold a normal
>   relation lock. We don't actually need to have an entirely distinct
>   type of lock for extensions, that theoretically could support N
>   extension locks for each relation.  The fact that these are associated
>   could be utilized in different ways:
>
>   - We could define relation extension as a separate lock level only
>     conflicting with itself (and perhaps treat AELs as including
>     it). That'd avoid needing separate shared memory states in many
>     cases.
>
>     This might be too problematic, because extension locks don't have
>     the same desired behaviour around group locking (and a small army of
>     other issues).

Yeah, I don't think that's likely to work out very nicely.

>   - We could keep a separate extension lock cached inside the relation
>     lock.  The first time a transaction needs to extend, it does the
>     normal work, and after that stores the PROCLOCK in the LOCALLOCK (or
>     something like that). Once extension is done, don't release the lock
>     entirely, but instead just reduce the lock level to a new
>     non-conflicting lock level
>
>     Alternatively we could implement something very similar outside of
>     lock.c, e.g. by caching the LOCALLOCK* (possibly identified by an
>     integer or such) in RelationData.  That'd require enough support
>     from lock.c to be able to make that really cheap.

Not sure I quite see what you mean here.

> We could e.g. split the maintenance of the hashtable(s) from protecting
> the state of individual locks. The locks protecting the hashtable would
> just be held long enough to change a "pin count" of each lock, and then
> a per LOCK lwlock would protect each heavyweight lock's state.  We'd not
> need to lock the partition locks for quite a few cases, because there's
> many locks in a loaded system that always have lockers.  There'd be cost
> to that, needing more atomic ops in some cases, but I think it'd reduce
> contention tremendously, and it'd open a lot more optimization
> potential.  It seems plausible that we could even work, as a followup,
> to not needing the partition locks for some lock releases (e.g. when
> there are other holders), and we might even be able to avoid it for
> acquisitions, by caching the LOCK inside LOCALLOCK, and re-identifying
> the identity.

I agree. This seems worth exploring. The idea of caching the probably
location of the lock and re-pinning it to check whether it's the one
you expected seems like it would avoid false sharing in a lot of
practical cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Improve heavyweight locks instead of building new lock managers?

From
Robert Haas
Date:
On Wed, Feb 19, 2020 at 11:14 PM Andres Freund <andres@anarazel.de> wrote:
> Here's a *draft* patch series for removing all use of SHM_QUEUE, and
> subsequently removing SHM_QUEUE.

+1 for that idea.

But color me skeptical of what Thomas described as the "incidental
constification".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company