Hi,
It seems that the Linux kernel will very shortly acquire a lightweight
userlevel locking primitive (called futexes), thanks primarily to Rusty
and Hubertus. It looks to be very useful for the sort of locking that
databases of various types need to do.
They have a bunch of rather nice properties:
a) low overhead in the no-contention case - a single locked instruction on i386
b) no kernel overhead for non-contended locks - make as many as you like, the kernel memory cost is only O(number of
lockswith waiters)
c) are interruptible / restartable across signals
d) the name :-)
They don't do:
a) deadlock detection
b) cleanup on process exit -- the kernel doesn't know who had the lock, so it can't help here
A reader/writer version is available, though it's currently implemented
with two futexes. Spin-for-a-while-before-sleeping versions are planned.
The API looks like this:
/* these can be stored anywhere -- mmapped file, * sysv shm, shared anonymous memory */struct futex lock;
/* this does mprotect(.., ...|PROT_SEM, ...); * seemingly some architectures need to do odd * things to get
atomic/coherentmemory */if(futex_region(lock, sizeof(lock))) fail("futexes not available on this kernel");
futex_init(&lock);
...
/* grab the lock -- we also have a futex_trydown */if(futex_down(&lock)) fail("couldn't get lock");
/* critical section */
/* release lock */futex_up(&lock);
We're looking for interesting applications to try this stuff out on.
Are there:
a) parts of postgresql which would like this, or
b) changes to the interface (or feature set) which would make it more suited
?
The LWLocks look not unlike this, but my impression is that they are
low-contention, so any improvement would be small.
Any pointers into appropriate bits of source would be greatly appreciated.
Thanks,
Matthew.