Thread: Lightweight locking primitive

Lightweight locking primitive

From

Matthew Kirkwood

Date:

12 March 2002, 16:25:45

Hi,

It seems that the Linux kernel will very shortly acquire a lightweight
userlevel locking primitive (called futexes), thanks primarily to Rusty
and Hubertus.  It looks to be very useful for the sort of locking that
databases of various types need to do.

They have a bunch of rather nice properties:

a) low overhead in the no-contention case - a single locked  instruction on i386
b) no kernel overhead for non-contended locks - make as  many as you like, the kernel memory cost is only  O(number of
lockswith waiters)
 
c) are interruptible / restartable across signals
d) the name :-)

They don't do:

a) deadlock detection
b) cleanup on process exit -- the kernel doesn't know who  had the lock, so it can't help here

A reader/writer version is available, though it's currently implemented
with two futexes.  Spin-for-a-while-before-sleeping versions are planned.

The API looks like this:
/* these can be stored anywhere -- mmapped file, * sysv shm, shared anonymous memory */struct futex lock;
/* this does mprotect(.., ...|PROT_SEM, ...); * seemingly some architectures need to do odd * things to get
atomic/coherentmemory */if(futex_region(lock, sizeof(lock)))    fail("futexes not available on this kernel");
 
futex_init(&lock);
...
/* grab the lock -- we also have a futex_trydown */if(futex_down(&lock))    fail("couldn't get lock");
/* critical section */
/* release lock */futex_up(&lock);


We're looking for interesting applications to try this stuff out on.
Are there:

a) parts of postgresql which would like this, or
b) changes to the interface (or feature set) which  would make it more suited

?

The LWLocks look not unlike this, but my impression is that they are
low-contention, so any improvement would be small.

Any pointers into appropriate bits of source would be greatly appreciated.

Thanks,

Matthew.

Re: Lightweight locking primitive

From

Igor Kovalenko

Date:

12 March 2002, 16:49:27

Matthew Kirkwood wrote:
> 
> Hi,
> 
> It seems that the Linux kernel will very shortly acquire a lightweight
> userlevel locking primitive (called futexes), thanks primarily to Rusty
> and Hubertus.  It looks to be very useful for the sort of locking that
> databases of various types need to do.
> 
> They have a bunch of rather nice properties:

I am curious how 'futexes' are different/better than POSIX (pthread
style) mutexes?

> 
> a) low overhead in the no-contention case - a single locked
>    instruction on i386

should be same for pthread_mutex_lock()

> b) no kernel overhead for non-contended locks - make as
>    many as you like, the kernel memory cost is only
>    O(number of locks with waiters)

Well it can't have kernel overhead for non-contended locks if a
non-contended lock is one opcode, can it?

> c) are interruptible / restartable across signals

Not sure what 'restartable' means? Do you mean locking primitives would
restarted by kernel when interrupted by signals? Like kernel calls with
SA_RESTART set? How that would be possible if kernel does not even know
about non-contended locks?

> d) the name :-)
> 
> They don't do:
> 
> a) deadlock detection
> b) cleanup on process exit -- the kernel doesn't know who
>    had the lock, so it can't help here
> 
> A reader/writer version is available, though it's currently implemented
> with two futexes.  Spin-for-a-while-before-sleeping versions are planned.
> 

RW locks are defined by POSIX too and can be implemented by mutex +
condvar. I wonder what is wrong with those... At the same time Linux has
POSIX semaphores which can not be shared across processes, making them
quite useless. Fixing that could help postgres quite a bit more I
think...

-- igor

Re: Lightweight locking primitive

From

Doug McNaught

Date:

12 March 2002, 17:18:45

Igor Kovalenko <Igor.Kovalenko@motorola.com> writes:

> Matthew Kirkwood wrote:
> > 
> > Hi,
> > 
> > It seems that the Linux kernel will very shortly acquire a lightweight
> > userlevel locking primitive (called futexes), thanks primarily to Rusty
> > and Hubertus.  It looks to be very useful for the sort of locking that
> > databases of various types need to do.
> > 
> > They have a bunch of rather nice properties:
> 
> I am curious how 'futexes' are different/better than POSIX (pthread
> style) mutexes?

They're basically the same thing.  Currently, pthread_mutexes on Linux
(implemented in glibc) are fairly gross in the contended case, since
there is no clean way to wait for lock release, and they interact
fairly nastily with signal semantics.  The futex patches create a new
system call which lets you cleanly wait for a locked futex (an
unlocking task checks for waiting lockers and calls into the kernel
for wakeups if it finds any).

There's no reason that POSIX mutextes and semaphores couldn't be
implemented on top of futexes, usable both in threaded and
multiprocess shared-memory environments. 

> Not sure what 'restartable' means? Do you mean locking primitives would
> restarted by kernel when interrupted by signals? Like kernel calls with
> SA_RESTART set? How that would be possible if kernel does not even know
> about non-contended locks?

I interpret the above as meaning: contended case (blocked in
futex_wait syscall or whatever it's called) can be cleanly interrupted
and by a signal and restarted automatically.

> RW locks are defined by POSIX too and can be implemented by mutex +
> condvar. I wonder what is wrong with those...

There's no conflict between POSIX locks and futexes; the latter are
just a good, new way to implement the former.

>                                           At the same time Linux has
> POSIX semaphores which can not be shared across processes, making them
> quite useless. Fixing that could help postgres quite a bit more I
> think...

Yes, having mutexes and semaphores shareable by different processes
is one of the benefits of the new locks as I understand them.

-Doug
-- 
Doug McNaught       Wireboard Industries      http://www.wireboard.com/
     Custom software development, systems and network consulting.     Java PostgreSQL Enhydra Python Zope Perl Apache
LinuxBSD...

Re: Lightweight locking primitive

From

Igor Kovalenko

Date:

12 March 2002, 18:52:34

You should take a look at pthread_mutex_setpshared(). May be they don't
in Linux, but that's just consequence of braindead implementation.

-- igor

Hubertus Franke wrote:
> 
> Biggest difference, FUTEX work across address spaces, pthread_mutexes don't
> !
> 
> Hubertus Franke
> Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
> , OS-PIC (Chair)
> email: frankeh@us.ibm.com
> (w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003
> 
> Igor Kovalenko <Igor.Kovalenko@motorola.com> on 03/12/2002 04:48:23 PM
> 
> To:
> cc:    pgsql-hackers@postgresql.org, Hubertus Franke/Watson/IBM@IBMUS,
>        rusty@rustcorp.com.au
> Subject:    Re: [HACKERS] Lightweight locking primitive
> 
> Matthew Kirkwood wrote:
> >
> > Hi,
> >
> > It seems that the Linux kernel will very shortly acquire a lightweight
> > userlevel locking primitive (called futexes), thanks primarily to Rusty
> > and Hubertus.  It looks to be very useful for the sort of locking that
> > databases of various types need to do.
> >
> > They have a bunch of rather nice properties:
> 
> I am curious how 'futexes' are different/better than POSIX (pthread
> style) mutexes?
> 
> >
> > a) low overhead in the no-contention case - a single locked
> >    instruction on i386
> 
> should be same for pthread_mutex_lock()
> 
> > b) no kernel overhead for non-contended locks - make as
> >    many as you like, the kernel memory cost is only
> >    O(number of locks with waiters)
> 
> Well it can't have kernel overhead for non-contended locks if a
> non-contended lock is one opcode, can it?
> 
> > c) are interruptible / restartable across signals
> 
> Not sure what 'restartable' means? Do you mean locking primitives would
> restarted by kernel when interrupted by signals? Like kernel calls with
> SA_RESTART set? How that would be possible if kernel does not even know
> about non-contended locks?
> 
> > d) the name :-)
> >
> > They don't do:
> >
> > a) deadlock detection
> > b) cleanup on process exit -- the kernel doesn't know who
> >    had the lock, so it can't help here
> >
> > A reader/writer version is available, though it's currently implemented
> > with two futexes.  Spin-for-a-while-before-sleeping versions are planned.
> >
> 
> RW locks are defined by POSIX too and can be implemented by mutex +
> condvar. I wonder what is wrong with those... At the same time Linux has
> POSIX semaphores which can not be shared across processes, making them
> quite useless. Fixing that could help postgres quite a bit more I
> think...
> 
> -- igor

Re: Lightweight locking primitive

From

Bruce Momjian

Date:

12 March 2002, 19:09:53

Doug McNaught wrote:
> They're basically the same thing.  Currently, pthread_mutexes on Linux
> (implemented in glibc) are fairly gross in the contended case, since
> there is no clean way to wait for lock release, and they interact
> fairly nastily with signal semantics.  The futex patches create a new
> system call which lets you cleanly wait for a locked futex (an
> unlocking task checks for waiting lockers and calls into the kernel
> for wakeups if it finds any).

Strange that it doesn't wait for the lock.  BSD/OS has:
    The pthread_mutex_lock() function locks the mutex pointed to by mutex. If    mutex is already locked, the calling
threadwill block until the mutex    becomes available.  Upon success the pthread_mutex_lock() function re-    turns
withthe mutex locked and the calling thread as its owner.
 

In fact, they have a pthread_mutex_trylock() version that doesn't block:
    The pthread_mutex_trylock() function performs a non-blocking mutex lock    operation.  It behaves exactly like
pthread_mutex_lock()except that if    any thread (including the calling thread) currently owns the mutex, an
immediateerror return is performed.
 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Lightweight locking primitive

From

Matthew Kirkwood

Date:

12 March 2002, 19:52:56

On Tue, 12 Mar 2002, Bruce Momjian wrote:

> > They're basically the same thing.  Currently, pthread_mutexes on Linux
> > (implemented in glibc) are fairly gross in the contended case, since
> > there is no clean way to wait for lock release,

> Strange that it doesn't wait for the lock.
[..]

It does wait, in that the call will not return before or unless
the thread has acquired the lock.  However, it waits in an ugly
way, via spin-and-yield or some evil signal or pipe hackery via
a manager thread.

pthread_mutexes are fairly ugly, but they should still be
lightweight.  Until now, there was no way to do that under
Linux.  (I don't know how the other free Unixes do it, but I
suspect it is not much better.)

Matthew.

Re: Lightweight locking primitive

From

Doug McNaught

Date:

12 March 2002, 23:30:12

Bruce Momjian <pgman@candle.pha.pa.us> writes:

> Doug McNaught wrote:
> > They're basically the same thing.  Currently, pthread_mutexes on Linux
> > (implemented in glibc) are fairly gross in the contended case, since
> > there is no clean way to wait for lock release, and they interact
> > fairly nastily with signal semantics.  The futex patches create a new
> > system call which lets you cleanly wait for a locked futex (an
> > unlocking task checks for waiting lockers and calls into the kernel
> > for wakeups if it finds any).
> 
> Strange that it doesn't wait for the lock.  BSD/OS has:

It does wait.  If the lock is already locked (atomic test in
userspace) the process makes a futex_wait system call, which puts the
process on a kernel waitqueue. 

From what I can see, the new Linux locks are not a replacement for
POSIX locks/semaphores, they're simply a fairly clean way of
implementing them.  They also just went into the development kernel,
so any appearance in production systems may take a few months at
least. 

-Doug
-- 
Doug McNaught       Wireboard Industries      http://www.wireboard.com/
     Custom software development, systems and network consulting.     Java PostgreSQL Enhydra Python Zope Perl Apache
LinuxBSD...

Re: Lightweight locking primitive

From

Igor Kovalenko

Date:

12 March 2002, 23:41:43

Matthew Kirkwood wrote:
> 
> On Tue, 12 Mar 2002, Bruce Momjian wrote:
> 
> > > They're basically the same thing.  Currently, pthread_mutexes on Linux
> > > (implemented in glibc) are fairly gross in the contended case, since
> > > there is no clean way to wait for lock release,
> 
> > Strange that it doesn't wait for the lock.
> [..]
> 
> It does wait, in that the call will not return before or unless
> the thread has acquired the lock.  However, it waits in an ugly
> way, via spin-and-yield or some evil signal or pipe hackery via
> a manager thread.
> 
> pthread_mutexes are fairly ugly, but they should still be
> lightweight.  Until now, there was no way to do that under
> Linux.  (I don't know how the other free Unixes do it, but I
> suspect it is not much better.)

If all free Unixes do it in such an ugly way then well, what you get is
what you paid for ;) 
I still would be surprized if all implementations were as bad as Linux
one is. Pthread mutexes are very lightweight and fast on Solaris and QNX
(I mostly work with those). They can be shared across processes on both.
Implementation-wise, QNX has corresponding blocking state so when some
thread locks a mutex other contenders get blocked by kernel. They are
(one of them) unblocked by kernel when mutex is released.

Speaking about ugliness, the only issue I see with pthread mutexes is
that they can get orphaned. There is no portable way to deal with that,
but again both Solaris and QNX have extended API which allows some
thread to aqcuire ownership of an orphaned mutex. I guess that
eventually will make its way into POSIX.

-- igor

Re: Lightweight locking primitive

From

"Hubertus Franke"

Date:

13 March 2002, 14:54:54

Biggest difference, FUTEX work across address spaces, pthread_mutexes don't
!

Hubertus Franke
Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: frankeh@us.ibm.com
(w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003

Igor Kovalenko <Igor.Kovalenko@motorola.com> on 03/12/2002 04:48:23 PM

To:
cc:    pgsql-hackers@postgresql.org, Hubertus Franke/Watson/IBM@IBMUS,      rusty@rustcorp.com.au
Subject:    Re: [HACKERS] Lightweight locking primitive

Matthew Kirkwood wrote:
>
> Hi,
>
> It seems that the Linux kernel will very shortly acquire a lightweight
> userlevel locking primitive (called futexes), thanks primarily to Rusty
> and Hubertus.  It looks to be very useful for the sort of locking that
> databases of various types need to do.
>
> They have a bunch of rather nice properties:

I am curious how 'futexes' are different/better than POSIX (pthread
style) mutexes?

>
> a) low overhead in the no-contention case - a single locked
>    instruction on i386

should be same for pthread_mutex_lock()

> b) no kernel overhead for non-contended locks - make as
>    many as you like, the kernel memory cost is only
>    O(number of locks with waiters)

Well it can't have kernel overhead for non-contended locks if a
non-contended lock is one opcode, can it?

> c) are interruptible / restartable across signals

Not sure what 'restartable' means? Do you mean locking primitives would
restarted by kernel when interrupted by signals? Like kernel calls with
SA_RESTART set? How that would be possible if kernel does not even know
about non-contended locks?

> d) the name :-)
>
> They don't do:
>
> a) deadlock detection
> b) cleanup on process exit -- the kernel doesn't know who
>    had the lock, so it can't help here
>
> A reader/writer version is available, though it's currently implemented
> with two futexes.  Spin-for-a-while-before-sleeping versions are planned.
>

RW locks are defined by POSIX too and can be implemented by mutex +
condvar. I wonder what is wrong with those... At the same time Linux has
POSIX semaphores which can not be shared across processes, making them
quite useless. Fixing that could help postgres quite a bit more I
think...

-- igor