Thread: Switch to unnamed POSIX semaphores as our preferred sema code?
I've gotten a bit tired of seeing "could not create semaphores: No space left on device" failures in the buildfarm, so I looked into whether we should consider preferring unnamed POSIX semaphores over SysV semaphores. We've had code for named and unnamed POSIX semaphores in our tree for a long time, but it's not actually used on any current platform AFAIK. There are good reasons to avoid the named-semaphore variant: typically that eats a file descriptor per sema per backend. However that complaint doesn't necessarily apply to unnamed semaphores. Indeed, it seems that on Linux an unnamed POSIX semaphore is basically a futex, which eats zero kernel resources; all the state is in userspace. Although in normal cases the semaphore code paths aren't very heavily exercised in our code, I was able to get a measurable performance difference by building with --disable-spinlocks, so that spinlocks are emulated with semaphores. On an 8-core RHEL6 machine, "pgbench -S -c 20 -j 20" seems to be about 4% faster with unnamed semaphores than SysV semaphores. It'd be good to replicate that test on some higher-end hardware, but provisionally I'd say unnamed semaphores are faster. The data structure is bigger: Linux's type sem_t is 32 bytes on 64-bit machines (16 bytes on 32-bit) whereas we use 8 bytes for SysV semaphores. But there aren't normally a huge number of semaphores in a cluster, and anyway this comparison is cheating because it ignores the space taken for the kernel data structures backing the SysV semaphores. There was some previous discussion about this in https://www.postgresql.org/message-id/flat/20160621193412.5792.65085%40wrigleys.postgresql.org but that thread tailed off without a resolution, partly because it wasn't the kind of change we'd consider making in late beta. One thing I expressed concern about there was whether there are any hidden kernel resources underlying an unnamed semaphore. So far as I can tell by strace'ing sem_init and sem_destroy, there are not, at least on Linux. Another issue is raised in today's discussion https://www.postgresql.org/message-id/flat/14947.1475690465%40sss.pgh.pa.us where it appears that we might need to be more careful about putting memory barriers into the unnamed-semaphore code (probably because it might not enter the kernel). But if that's a bug, we'd want to fix it anyway, IMO. So for Linux, I think probably we should switch. macOS seems not to have unnamed POSIX semaphores, only named ones (the functions exist, but they always fail with ENOSYS). However, some googling suggests that other BSD derivatives do have these primitives, so somebody ought to do a similar comparison on them to see if switching is a win. (The first thread above asserts that it is for FreeBSD, but someone should recheck using a test case that stresses semaphores more.) Dunno about other platforms. sem_init is nominally required by SUS v2, but it doesn't seem to actually exist everywhere, so I doubt we can drop SysV altogether. I'd be inclined to change the default on a platform- by-platform basis not whole hog. If anyone wants to test, the main thing you have to do to try this in the existing code is to add "USE_UNNAMED_POSIX_SEMAPHORES=1" and "--disable-spinlocks" to your configure arguments. On Linux you may need to add -lrt to the backend LIBS list, though on my machine configure is putting that in already. regards, tom lane
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane > I've gotten a bit tired of seeing "could not create semaphores: No space > left on device" failures in the buildfarm, so I looked into whether we should > consider preferring unnamed POSIX semaphores over SysV semaphores. +100 Wonderful decision and cautious analysis. This will make PostgreSQL more friendly to users, especially newcomers, by eliminatingthe need to tune kernel resources. I wish other kernel resources (files, procs) will need no tuning like Windows,but that's just a daydream. Regards Takayuki Tsunakawa
I wrote: > Although in normal cases the semaphore code paths aren't very heavily > exercised in our code, I was able to get a measurable performance > difference by building with --disable-spinlocks, so that spinlocks are > emulated with semaphores. On an 8-core RHEL6 machine, "pgbench -S -c 20 > -j 20" seems to be about 4% faster with unnamed semaphores than SysV > semaphores. It'd be good to replicate that test on some higher-end > hardware, but provisionally I'd say unnamed semaphores are faster. I realized that the above test is probably bogus, or at least not very relevant to real-world Postgres performance. A key performance aspect of Linux futexes is that uncontended lock acquisitions, as well as releases that don't need to wake anyone, don't enter the kernel at all. However, in PG's normal use of semaphores, neither scenario occurs very often; processes lock their semaphores only after determining that they need to wait, and release semaphores only when it's known they'll waken a sleeper. The futex fast-path cases can occur only in the race condition that someone else awakens a would-be waiter before it actually reaches its semop call. However, uncontended locks and releases *are* very common for spinlocks. This means that testing with --disable-spinlocks will show a futex performance benefit that's totally irrelevant for real cases. Based on that analysis, I abandoned testing with --disable-spinlocks and instead tried to measure the actual speed of contended heavyweight lock acquisition/release. I usedpgbench -f lockscript.sql -c 20 -j 20 -T 60 bench with the script beingbegin; lock table pgbench_accounts; commit; I got speeds between 10500 and 10800 TPS with either semaphore API; if there's any difference at all, it's below the noise level for this test scenario. So I'm now thinking there's basically no performance consideration here, and the point of switching would just be to get out from under SysV kernel resource limits. (Again though, this applies only to Linux --- the other thread I cited suggests things might be quite different on FreeBSD for instance.) Can anyone think of a test case that would stress semaphore operations more heavily, without being unrealistic? regards, tom lane
On Thu, Oct 6, 2016 at 9:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Can anyone think of a test case that would stress semaphore operations > more heavily, without being unrealistic? I think it's going to be pretty hard to come up with a non-artificial test case that has exhibits meaningful lwlock contention on an 8-core system. If you go back to 9.1, before we had fast-path locking, you can do it, because the relation locks and vxid locks do cause noticeable contention on the lock manager locks in that version. Alternatively, you might try something like "pgbench -n -S -c $N -j $N" with a scale factor that doesn't fit in shared buffers. This probably won't produce significant contention because there are 128 LWLocks and only 8 cores, but you could reduce the number of buffer mapping LWLocks to, say, 4 and then you'd probably hit it fairly hard. Alternatively, get a bigger box. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Alternatively, get a bigger box. :-) So what's it take to get access to hydra? regards, tom lane
On Thu, Oct 6, 2016 at 5:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Alternatively, get a bigger box. :-) > > So what's it take to get access to hydra? Send me a private email with your .ssh key. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Oct 6, 2016 at 9:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Can anyone think of a test case that would stress semaphore operations >> more heavily, without being unrealistic? > I think it's going to be pretty hard to come up with a non-artificial > test case that has exhibits meaningful lwlock contention on an 8-core > system. If you go back to 9.1, before we had fast-path locking, you > can do it, because the relation locks and vxid locks do cause > noticeable contention on the lock manager locks in that version. > ... > Alternatively, get a bigger box. :-) Well, I did both of the above. I tried 9.1 on "hydra", that 60-processor POWER7 box, and cranked the parallelism up to ridiculous levels: pgbench -S -j 250 -c 250 -M prepared -T 60 bench Median of 3 runs with sysv semaphores: number of transactions actually processed: 1554570 tps = 25875.432836 (including connections establishing) tps = 25894.938187 (excluding connections establishing) Ditto, for unnamed POSIX semaphores: number of transactions actually processed: 1726696 tps = 28742.486104 (including connections establishing) tps = 28765.963071 (excluding connections establishing) That's about a 10% win for POSIX semaphores. Now, at saner loads, I couldn't see much of any difference between the two semaphore APIs. So I'm still of the opinion that there's not likely to be any meaningful performance difference in practice, at least not on reasonably recent Linux machines. But this does indicate that if there is any difference, it will probably favor switching. regards, tom lane
Re: Tom Lane 2016-10-08 <29244.1475959928@sss.pgh.pa.us> > So I'm still of the opinion that there's not likely to be any meaningful > performance difference in practice, at least not on reasonably recent > Linux machines. But this does indicate that if there is any difference, > it will probably favor switching. Another data point that's admittedly much more of a footnote than serious input to the original question is the following: Debian has a (so far mostly toy) port "hurd-i386" which is using the GNU hurd kernel along with the usual GNU userland that's also in use on Linux. This OS doesn't implement any semaphores yet (PG compiles, but initdb dies with ENOSYS immediately). On talking to the porters, they advised that POSIX semaphores would have the best chances to get implemented first, so I added USE_UNNAMED_POSIX_SEMAPHORES=1 to the architecture template to be prepared for that. Christoph (The patch quoted below is obviously Debian-specific and not meant for inclusion upstream.) hurd doesn't support sysv semaphores (semget), and needs -pthread to find sem_init. POSIX semaphores shared between processes (sem_init(pshared = 1)) aren't supported yet either, but have the best chance to get implemented, so be prepared. FATAL: could not create semaphores: Function not implemented DETAIL: Failed system call was semget(1, 17, 03600). undefined reference to symbol 'sem_init@@GLIBC_2.12' --- a/src/backend/Makefile +++ b/src/backend/Makefile @@ -109,6 +109,10 @@ endifendif # aix +ifeq ($(shell dpkg-architecture -qDEB_HOST_ARCH_OS), hurd) +LIBS += -pthread +endif # hurd +# Update the commonly used headers before building the subdirectories$(SUBDIRS:%=%-recursive): | generated-headers --- a/src/template/linux +++ b/src/template/linux @@ -28,3 +28,10 @@ if test "$SUN_STUDIO_CC" = "yes" ; then ;; esacfi + +# force use of POSIX instead of SysV semaphores on hurd-i386 +case $(dpkg-architecture -qDEB_HOST_ARCH) in + hurd*) + USE_UNNAMED_POSIX_SEMAPHORES=1 + ;; +esac
Christoph Berg <myon@debian.org> writes: > Another data point that's admittedly much more of a footnote than > serious input to the original question is the following: Debian has a > (so far mostly toy) port "hurd-i386" which is using the GNU hurd > kernel along with the usual GNU userland that's also in use on Linux. > This OS doesn't implement any semaphores yet (PG compiles, but initdb > dies with ENOSYS immediately). On talking to the porters, they advised > that POSIX semaphores would have the best chances to get implemented > first, so I added USE_UNNAMED_POSIX_SEMAPHORES=1 to the architecture > template to be prepared for that. As of HEAD, that should happen automatically for anything using the "linux" template. I did some googling (but no actual testing) to try to find out the state of POSIX sema support for the other platform templates: aix AIX doesn't seem to have support (reportedly, the functions exist but always fail). cygwin Not clear whether unnamed semas work on this; I found conflicting reports. darwin Unnamed semas are known not to work here. hpux Reportedly, unnamed POSIX sema support exists on HPUX 11.x, but on 10.x sem_init fails with ENOSYS. We'd need a run-time test in configure to see whether to use it. Doubt it's worth the trouble. netbsd No support for cross-process unnamed semas. openbsd No support for cross-process unnamed semas. sco Doubt anyone cares. solaris Apparently supported in newer versions of Solaris; as with HPUX, we might need a run-time configure probe to tell. Again, without specific evidence that it might be worth switching, I doubt it's worth taking any trouble over. unixware Doubt anyone cares. win32 No support. So at this point it seems likely that stopping with Linux and FreeBSD is the thing to do, and as far as I can tell the code we have now is working with all variants of those that we have in the buildfarm. (I'm a little suspicious that older variants of FreeBSD might not have working sem_init, like the other *BSD variants, necessitating a run-time test there. But we'll cross that bridge when we come to it.) So, barring further input, this project is done. I'll go update the user docs to explain the new state of affairs. regards, tom lane
On Tue, Oct 11, 2016 at 5:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So at this point it seems likely that stopping with Linux and FreeBSD > is the thing to do, and as far as I can tell the code we have now is > working with all variants of those that we have in the buildfarm. > (I'm a little suspicious that older variants of FreeBSD might not > have working sem_init, like the other *BSD variants, necessitating > a run-time test there. But we'll cross that bridge when we come > to it.) The sem_init man page from FreeBSD 8.4[1] (EOL August 2015) and earlier said: This implementation does not support shared semaphores, and reports this fact by setting errno to EPERM. FreeBSD 9.0 (released January 2012) reimplemented semaphores and removed those words from that man page[2]. All current releases[3] support it, though I guess there may be 8.4 machines out there a year and a bit after EOL. [1] https://www.freebsd.org/cgi/man.cgi?query=sem_init&apropos=0&sektion=0&manpath=FreeBSD+8.4-RELEASE&arch=default&format=html [2] https://www.freebsd.org/cgi/man.cgi?query=sem_init&apropos=0&sektion=0&manpath=FreeBSD+9.0-RELEASE&arch=default&format=html [3] https://www.freebsd.org/releases/ -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Tue, Oct 11, 2016 at 5:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (I'm a little suspicious that older variants of FreeBSD might not >> have working sem_init, like the other *BSD variants, necessitating >> a run-time test there. But we'll cross that bridge when we come >> to it.) > The sem_init man page from FreeBSD 8.4[1] (EOL August 2015) and earlier said: > This implementation does not support shared semaphores, and reports this > fact by setting errno to EPERM. > FreeBSD 9.0 (released January 2012) reimplemented semaphores and > removed those words from that man page[2]. Yeah, in subsequent googling I found other mentions of this having been added in FreeBSD 9.0. But that will be more than 5 years old by the time PG 10 gets out. > All current releases[3] support it, though I guess there may be 8.4 > machines out there a year and a bit after EOL. We don't have anything older than 9.0 in the buildfarm, which I take to indicate that nobody particularly cares about older versions anymore. I would just as soon not add a run-time test in configure (it breaks cross-compiles), so I'd rather wait and see if anyone complains. regards, tom lane