Re: Reducing overhead of frequent table locks - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Reducing overhead of frequent table locks
Date
Msg-id 20110524090734.GA18831@tornado.gateway.2wire.net
Whole thread Raw
In response to Re: Reducing overhead of frequent table locks  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Reducing overhead of frequent table locks  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, May 23, 2011 at 09:15:27PM -0400, Robert Haas wrote:
> On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah@leadboat.com> wrote:
> > ? ? ? ?if (level >= ShareUpdateExclusiveLock)
> > ? ? ? ? ? ? ? ?++strong_lock_counts[my_strong_lock_count_partition]
> > ? ? ? ? ? ? ? ?sfence
> > ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 1)
> > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 1 */
> > ? ? ? ? ? ? ? ? ? ? ? ?import_all_local_locks
> > ? ? ? ? ? ? ? ?normal_LockAcquireEx
> > ? ? ? ?else if (level <= RowExclusiveLock)
> > ? ? ? ? ? ? ? ?lfence
> > ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 2 */
> > ? ? ? ? ? ? ? ? ? ? ? ?local_only
> > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 3 */
> > ? ? ? ? ? ? ? ?else
> > ? ? ? ? ? ? ? ? ? ? ? ?normal_LockAcquireEx
> > ? ? ? ?else
> > ? ? ? ? ? ? ? ?normal_LockAcquireEx
> >
> > At marker 1, we need to block until no code is running between markers two and
> > three. ?You could do that with a per-backend lock (LW_SHARED by the strong
> > locker, LW_EXCLUSIVE by the backend). ?That would probably still be a win over
> > the current situation, but it would be nice to have something even cheaper.
> 
> Barring some brilliant idea, or anyway for a first cut, it seems to me
> that we can adjust the above pseudocode by assuming the use of a
> LWLock.  In addition, two other adjustments: first, the first line
> should test level > ShareUpdateExclusiveLock, rather than >=, per
> previous discussion.  Second, import_all_local locks needn't really
> move everything; just those locks with a matching locktag.  Thus:
> 
> !        if (level > ShareUpdateExclusiveLock)
> !                ++strong_lock_counts[my_strong_lock_count_partition]
> !                sfence
> !                for each backend
> !                        take per-backend lwlock for target backend
> !                        transfer fast-path entries with matching locktag
> !                        release per-backend lwlock for target backend
> !                normal_LockAcquireEx
> !        else if (level <= RowExclusiveLock)
> !                lfence
> !                if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> !                        take per-backend lwlock for own backend
> !                        fast-path lock acquisition
> !                        release per-backend lwlock for own backend
> !                else
> !                        normal_LockAcquireEx
> !        else
> !                normal_LockAcquireEx

This drops the part about only transferring fast-path entries once when a
strong_lock_counts cell transitions from zero to one.  Granted, that itself
requires some yet-undiscussed locking.  For that matter, we can't have
multiple strong lockers completing transfers on the same cell in parallel.
Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each
strong locker holds for that entire "if" body and while decrementing the
strong_lock_counts cell at lock release.

As far as the level of detail of this pseudocode goes, there's no need to hold
the per-backend LWLock while transferring the fast-path entries.  You just
need to hold it sometime between bumping strong_lock_counts and transferring
the backend's locks.  This ensures that, for example, the backend is not
sleeping in the middle of a fast-path lock acquisition for the whole duration
of this code.

> Now, a small fly in the ointment is that we haven't got, with
> PostgreSQL, a portable library of memory primitives.  So there isn't
> an obvious way of doing that sfence/lfence business.

I was thinking that, if the final implementation could benefit from memory
barrier interfaces, we should create those interfaces now.  Start with only a
platform-independent dummy implementation that runs a lock/unlock cycle on a
spinlock residing in backend-local memory.  I'm 75% sure that would be
sufficient on all architectures for which we support spinlocks.  It may turn
out that we can't benefit from such interfaces at this time ...

> Now, it seems to
> me that in the "strong lock" case, the sfence isn't really needed
> anyway, because we're about to start acquiring and releasing an lwlock
> for every backend, and that had better act as a full memory barrier
> anyhow, or we're doomed.  The "weak lock" case is more interesting,
> because we need the fence before we've taken any LWLock.

Agreed.

> But perhaps it'd be sufficient to just acquire the per-backend lwlock
> before checking strong_lock_counts[].  If, as we hope, we get back a
> zero, then we do the fast-path lock acquisition, release the lwlock,
> and away we go.  If we get back any other value, then we've wasted an
> lwlock acquisition cycle.  Or actually maybe not: it seems to me that
> in that case we'd better transfer all of our fast-path entries into
> the main hash table before trying to acquire any lock the slow way, at
> least if we don't want the deadlock detector to have to know about the
> fast-path.  So then we get this:
> 
> !        if (level > ShareUpdateExclusiveLock)
> !                ++strong_lock_counts[my_strong_lock_count_partition]
> !                for each backend
> !                        take per-backend lwlock for target backend
> !                        transfer fastpath entries with matching locktag
> !                        release per-backend lwlock for target backend
> !        else if (level <= RowExclusiveLock)
> !                take per-backend lwlock for own backend
> !                if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> !                        fast-path lock acquisition
> !                        done = true
> !                else
> !                        transfer all fastpath entries
> !                release per-backend lwlock for own backend
> !        if (!done)
> !                normal_LockAcquireEx

Could you elaborate on the last part (the need for "else transfer all fastpath
entries") and, specifically, how it aids deadlock avoidance?  I didn't think
this change would have any impact on deadlocks, because all relevant locks
will be in the global lock table before any call to normal_LockAcquireEx.

To validate the locking at this level of detail, I think we need to sketch the
unlock protocol, too.  On each strong lock release, we'll decrement the
strong_lock_counts cell.  No particular interlock with fast-path lockers
should be needed; a stray AccessShareLock needlessly making it into the global
lock table is no problem.  As mentioned above, we _will_ need an interlock
with lock transfer operations.  How will transferred fast-path locks get
removed from the global lock table?  Presumably, the original fast-path locker
should do so at transaction end; anything else would contort the life cycle.
Then add a way for the backend to know which locks had been transferred as
well as an interlock against concurrent transfer operations.  Maybe that's
all.

> That seems like it ought to work, at least assuming the position of
> your fencing instructions was correct in the first place.  But there's
> one big problem to worry about: what happens if the lock transfer
> fails due to shared memory exhaustion?  It's not so bad in the "weak
> lock" case; it'll feel just like the already-existing case where you
> try to push another lock into the shared-memory hash table and there's
> no room.  Essentially you've been living on borrowed time anyway.  On
> the other hand, the "strong lock" case is a real problem, because a
> large number of granted fast-path locks can effectively DOS any strong
> locker, even one that wouldn't have conflicted with them.  That's
> clearly not going to fly, but it's not clear to me what the best way
> is to patch around it.

To put it another way: the current system is fair; the chance of hitting lock
exhaustion is independent of lock level.  The new system would be unfair; lock
exhaustion is much more likely to appear for a > ShareUpdateExclusiveLock
acquisition, through no fault of that transaction.  I agree this isn't ideal,
but it doesn't look to me like an unacceptable weakness.  Making lock slots
first-come, first-served is inherently unfair; we're not at all set up to
justly arbitrate between mutually-hostile lockers competing for slots.  The
overall situation will get better, not worse, for the admin who wishes to
defend against hostile unprivileged users attempting a lock table DOS.

Thanks,
nm


pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Proposal: Another attempt at vacuum improvements
Next
From: "Kevin Grittner"
Date:
Subject: Re: SSI predicate locking on heap -- tuple or row?